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I INTHODCCTION 


A. General 

This annual report describes the work performed during the most 
recent year of a program of research in the field of artificial intelli¬ 
gence. The work reported here began in October 1971, and is itself the 

* 

continuation of work performed under a previous contract. Therefore 
this is a report on the most recent accomplishments and status of a con¬ 
tinuing research program. 

During the course of the year we have documented the details of our 
technical work in a series of reports, Journal articles, and presenta¬ 
tions.^ Our intention in this report is to provide an overview of the 
project, rather than to reproduce those details here. 

B. Background 

For a number of years our work has been focused on the application 
of artificial Intelligence techniques to the control of a mobile automa¬ 
ton—a robot nicknamed Shakcy—in an actual laboratory environment. 
This work reached its first plateau in 1969 with the completion of the 
first integrated robot system: a mobile vehicle equipped with a TV 
camera and other sensors, and controlled by an SDS 940 computer. During 
the following two and a half years we developed a new, more powerful 
robot system. While the robot vehicle remained substantially unchanged, 
the old SDS 940 computer was replaced by a PDP-10/PDP-15 facility with 


Contract NASW-2164. 

♦ 

These are listed in Section VI. 
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significantly more capability. The software controlling the robot was 
completely redesigned to incorporate more general and powerful methods 
both for solving robot problems in the abstract and also for executing 
the solutions in the real world. This second phase of our activity 
reached a piateau during the past year, by which time we had completed 
the design and implementation of the entire system and had carried out a 
series of experiments to explore its strengths and weaknesses. 

As this work progressed we developed preliminary ideas for methods 
that would dramatically increase the capabilities of the robot in 
several directions. It was clear that these methods could not be included 
in the existing system in any convenient way. Accordingly, we elected 
to devote most of our resources to the task of designing a new robot 
system that would encompass our new ideas for robot problem solving, 
perception, and real-world execution monitoring. Thus, our work during 
the past year has been divided between completing some tasks associated 
with the existing robot software and beginning design studies for a new 
system. 

C. Report Outline 

Section II of this report presents our recent work in robot problem 
solving. It describes some experiments performed on the existing system, 
and outlines some of our ideas for a new problem solver. Section III 
discusses the design of a lobot perception system for analyzing pictorial 
and range data. In Sections IV and V we describe the software and hardware 
support activities associated with our research. Finally, Section VI 
lists the publications and presentations that were prepared or presented 
during the project period. 
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II PROBLEM SOLVING 


A. Introduction 

Our research on automatic problem solving has as its goal the de¬ 
velopment of systems that can plan and execute sequences of actions for 
a robot. In our formulation, we assume that the robot is given some 
command such as "push the small box next to the large box," and the 
problem solver then creates a plan for accomplishing the task. The plan 
consists of a sequence of motor-action programs, such as "move to posi¬ 
tion X = 3.2, Y = 4.6." After a plan is generated, we desire that it be 
executed "intelligently," that is, with due regard for the actual effects 
of each action. 

During the last two years we have developed a plan generating system 

called STRIPS, a plan executing system called PLANEX, and a learning 

system that generalizes and saves plans produced by STRIPS. The present 

* 

status of these systems is well documented in Ref. (1), so we shall not 
describe details here. We have also produced a 25-minute, 16-mm, sound 
film® that depicts STRIPS and PLANEX in action controlling our mobile 
robot, SHAKEY. During the past year we have mr.de some improvements to 
these systems and conducted some experiments illustrating performance 
on some learning tasks. These developments are discussed in more detail 
in the next section. 

Our work with STRIPS has clarified some of its limitations, and we 
have begun to think about how they might be overcome. The results of 
some of our speculations on this subject are contained in a paper 3 given 


* 

References are listed at tho end of this report. 
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at the last Machine Intelligence workshop. Recently we have begun the 
design of a new problem solving system. Progress on this design will be 
described later in this report. 

B. Description of Work on the STRIPS-PLANEX System 
1. Development and Refinements 

During the preceding year we continued development and ex¬ 
perimentation with plan generalization procedures for the STRIPS-PLANEX 
system. We can illustrate some of the issues we have dealt with by 
considering the following example. Assume that adjacent rooms R1 and R2 
are connected by door Dl, the robot is in room Rl, box B1 is in room R2, 
and the task is to bring box B1 into room Rl. 

If STRIPS had available the appropriate GOTHRU and PUSHTHRU 
operators, then it could form the two step plan: 

GOTHRU(D1,R1,R2) [Go through door D1 from room Rl 

into room R2] 

PUSHTHRU(B1,D1,R2,R1) ; Push B1 through door D1 from R2 

into room Rl]. 

While this sequence solves the original task, it probably doesn't warrant 
being saved for the future unless, of course, we expect that the robot 
would often need to go from room Rl through door D1 to room R2 to push 
back the specific box, Bl, through door D1 into room Rl. We would like 
to generalize the plan so that it could be free from the specific con¬ 
stants, Dl, Rl, R2, and Bl, and could be used in situations involving 
arbitrary doors, rooms, and boxes. 

In considering possible procedures for generalizing plans we 
first rejected the naive suggestion of merely replacing each constant in 
the plan by a parameter. Some of the constants may really need to have 
specific values in order for the plan to work at all. For example, 
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consider a modification of our box fetching plan in which the second 
step of the plan is an operator that only pushes objects from room R2 
into room Rl. The specific plan might then be 

G0THRU(D1,R1,R2) 

SPECIALPUSH(Bl). 

When we generalize this plan we cannot replace all constants by parameters, 
since the plan only works when the third argument of GOTHRU is R2. We 
would want our procedure to recognize this fact and produce the plan 

* 

GOTHRU(dx,rx,R2) 

SPECIALPUSH(bx). 

Another reason for rejecting the simple replacement of constants 
by parameters is that there is often more generality readily available 
in many plans than this simple procedure will extract. For example, the 
form of our box pushing plan, GOTHRU followed by PUSlfTHRU, does not require 
that the room in which the robot begins be the same room into which the 
box is pushed. Hence the plan could be generalized as follows: 

GOTHrtU(dx,rx,ry) 

PUSHTHRU(bx,dy,ry,rz) 

and be used to go from one room to an adjacent second room and push a 
box to an adjacent third room. 

The plan generalization procedure we have developed overcomes 
these difficulties by taking into account the internal structure of the 
plan and the preconditions of each operator. Our first versions of this 
procedure often introduced irrelevant items in the generalized plan's 
precondition list. For example, the creation of extraneous parameters 
might cause the preconditions to include the requirement that box bx be 


* 

We use lower case letters to represent parameters. 
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in rooir rx and the requirement that box bx also be in room ry. We know 
that any semantically correct model that satisfies these preconditions 
will require that rx and ry be instantiated to the same room name; hence, 
the creation of distinct parameters rx and ry is a superfluous over¬ 
generalization* Such cases of overgeneralization tended to bog down the 
theorem proving operations during planning and therefore degraded the 
efficiency of the system. We now eliminate almost all cases of this 
problem by introducing a processing step after the MACROP is formed that 
searches for such irrelevancies and removes them by "collapsing' two or 
more parameters into a single parameter. A complete description of our 
plan generalizing procedure is given in Ref. (1). 

2. Experimental Results 

We spent a sizable effort during the year running experiments 
with the STRIPS-PLANEX system to determine its behavior characteristics. 
The results of many of these experiments are documented elsewhere, but 
we will provide summary descriptions of some of them in this section. 

Problems were posed to the system in the SRI robot's current 
experimental environment, which is shown in Figure 1; there are seven 
rooms, eight doors, and several boxes about two feet high. A typical 
state of this environment is modeled by STRIPS using about 160 axioms. 

a. Operator Descriptions 

The operator descriptions given to STRIPS for these ex¬ 
periments model the robot's preprogrammed action routines for moving 
the robot next to a door in a room, next to a box in a room, to a loca¬ 
tion in a room, or through a door. There are also operators that model 
action routines for pushing a box next to another box in a room, to a 
location in a room, or through a door. In addition, we have included 
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FIGURE 1 MAP OF SHAKEY'S EXPERIMENTAL ENVIRONMENT 
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operator descriptions that model fictitious action routines for opening 
and closing doors; these are given in Table 1. 

b. Example Problems 

A sequence of five problems was designed to illustrate the 
various ways in whica MACROPs are used during planning. We show in 
Figures 2 through 6 a summary of the system's behavior for each problem 
in the sequence. Each summary is preceded by a diagram of the problem's 
initial and final states. STRIPS' attention was directed to the rooms 
shown in the diagram by closing the doors connecting all other rooms. 

Table 2 shows the search tree sizes and running times for 
the five problems. The problems were run both with and without the use 
of MACROPs for comparison. Even when MACROPs were not being used for 
planning we include the MACROP production time, since PLANEX needs the 
MACROP to monitor plan execution. Note that the times and the search 
tree sizes are all smaller when MACROPs are used and that the MACROPs 
allow longer plans to be formed without necessarily incurring an exponen¬ 
tial increase in planning time. 

c. Further Experiments 

In another set of experiments that were run with the new 
system, the primary goal was to produce long plans. We ran a sequence 
of eight problems in our robot environment that culminated in the pro¬ 
duction of a 19-operator plan for fetching three boxes from three different 
rooms and then pushing the three boxes together. This final MACROP sub¬ 
sumed the seven earlier ones so that only one MACROP was retained by the 
system. Subsequences of the 19-step MACROP could be used to fetch boxes, 
push boxes together, move the robot from room to room, and so on. 
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Table 1 


OPERATOR DESCRIPTIONS 


GOTOB(bx) 

(Go to object bx). 

Preconditions: 

TYPE(bx,OBJECT), (Erx)[ INROOM (bx,rx) A 1NR00M(R0B0T,rx)J 

Deletions: 

AT(ROBOT, $1, $2), NEXTTO(ROBOT, $1) 

Additions: 

♦NEXTTO(ROBOT,bx) 

GOTOD(dx) 

(Go to door dx). 

Preconditions: 

TYPE (dx, DOOR), (Erx) (Ery) [ INROOM (ROBOT, rx) A CONNECTS (dx,rx,ry) ] 

Deletions: 

AT (ROBOT, $ 1, $2 ) , NEXTTO (R OBOT, $1) 

Additions: 

♦NEXTTO (ROBOT,dx) 

GOTOL(x,y) 

[(Go to coordinate location (x,y,)]. 

Preconditions: 

(Erx)[ INROOM(ROBOT,rx) A LOCINROOM(x,y ,rx) ] 

Deletions: 

AT (ROBOT, $1, $2 ), NEXTTO (ROBOT, $1) 

Additions: 

♦AT (ROBOT, X, y ) 

PUSHB(bx,by) 

(Push bx to object by). 

Preconditions: 

TYPE (by, OBJECT), PUSHABLE (bx) , NEXTTO(ROBOT, bx), 

(Erx)[INROOM(bx,rx) A INROOM(by,rx)] 

Deletions: 

AT(ROBOT,$l,$2),NEXTTO(ROBOT,$l),AT(bX,$l,$2),NEXTTO(bx,$l),NEXTTO($l,bx) 

Additions: 

*NEXTTO(by,bx),*NEXTTO(bx,by),NEXTTO(ROBOT,bx) 

PUSHD(bx,dx) 

(Push bx to door dx). 

Preconditions: 

PUSHABLE (bx), TYPE (dx, DOOR), NEXTTO (ROBOT, bx), 

(Erx)(Ery)[INROOM(bx,rx) A CONNECTS(dx,rx,ry)] 

Deletions: 

AT (ROBOT, $1, $2), NEXTTO (ROBOT ,$l),AT(bx,$l,$2) ,NEXTTO(bX,$l) ,NEXTTO($l,bx) 

Additions: 

♦NEXTTO(bx,dx) , NEXTTO (ROBOT, bx) 

PUSHL(bx,x,v) 

[(Push bx to coordinate location (x,y,)]. 

Preconditions: 

PUSHABLE(bx),NEXTTO(ROBOT,bx),(Erx)[INROOM(ROBOT,rx) A LOCINROOM(x,y,rx)] 

Deletions: 

AT (ROBOT, $1, $2 ), NEXTTO (ROBOT ,$l),AT(bx,$l,$2) , NE.’'.TTO(bx, $1) ,NEXTTO($l,bx) 

Additions: 

* AT(bx, x,y), NEXTTO (ROBOT, bx) 

GOTHRUDR (dx,rx) 

(Go through door dx into room rx). 

Preconditions: 

TYPE (dx,DOOR), STATUS (dx, OPEN) .TYPE (rx,ROOM) , NEXTTO(ROBOT,dx) , 

(Ery) INROOM(ROBOT,ry) A CONNECTS(dx,ry,rx)] 

Deletions: 

AT(ROBOT, $1, $2 ), NEXTTO (ROBOT,$1), INROOM (ROBOT, $1) 

Additions: 

*INROOM(ROBOT,rx) 

PUSHTHRUDR(bx,dx,rx) 

(Push bx through door dx into room rx). 

Preconditions: 

PUSUABLE(bx), TYPE (dx, DOOR), STATUS(dx, OPEN), TY PE (rx, ROOM), NEXTTO(bX,dx), 
NEXTTO(ROBOT,bx),(Ery)[INROOM(bx,ry) A CONNECTS(dx,ry,rx)] 

Deletions: 

AT (ROBOT, $ 1, $2 ) , NEXTTO (R OBOT, $ 1), AT ( bx, $ 1, $2 ) , NEXTTO( bx, $1), 

NEXTTO($l ,bx), INROOM (ROBOT, $1), INROOM(bx, $ 1 ) 

Additions: 

♦INROOM(bx,rx),INROOM(ROBOT,rx),NEXTTO(ROBOT,bx) 

OPEN(dx) 

(Open door dx). 

Preconditions: 

NEXTTO (ROBOT, dx), TYPE (dx, DOOR ), STATUS (dx,CLOSED) 

Deletions: 

STATUS(dx,CLOSED) 

Additions: 

♦STATUS(dx,OPEN) 

CLOSE(dx) 

(Close door dx). 

Preconditions: 

NE XTTO (R OBOT, dx ), TY PE ( dx, DOOR ), STATUS ( d X, OPE N ) 

Deletions: 

STATUS (dx, OPEN) 

Additions: 

*STATUS(dx,CLOSED) 


* 

The addition clauses preceded by an asterisk are the primary additions of the operator. 

When STRIPS searches for a relevant operator it considers only these primary addition clauses. 
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FIGURE 2 EXAMPLE PROBLEM 1 


Task Statement: 

INROOM(ROBOT,RRAM) A NEXTTO(BOX1,BOX2) 

Generalized Plan: 

MACROP1(par29,par37,par45,par54,par33) 

GOTOB(par29) 

PUSHB(par29,par37) 

GOTOD(par45) 

GOTHRUDR(par45,par54) 

Comments: 

The generalized plan for the first problem in the sequence pushes 
two boxes together and takes the robot into an adjacent room, 
given that the robot and the boxes are initially all in the same 
room. 


10 





SA-1630-22 


FIGURE 3 EXAMPLE PROBLEM 2 


Task Statement: 

INR00M(ROBOT,RPDP) \ NEXTTO(BOX2,BOX3) 

Generalized Plan: 

MACR0P2(par27,par52,par72,par91,pari11, par38,par40) 

GOTOD(par27) 

GOTHRUDR(par27,par40) 

GOTOB(par52) 

PUSHB(par52,par72) 

GOTOD(par91) 

GOTHRUDR(par91,parlll) 

Comments: 

The second problem is similar to the first except that different 
rooms and different boxes are used, and the robot begins in a room 
adjacent to the room containing the boxes. STRIPS uses a tail of 
MACRQP1 to get the robot into the room with the boxes and then uses 
the entire MACR0P1 to complete the plan. The generalized plan takes 
the robot from one room into an adjacent room, pushes two boxes 
together in the second room, and then takes the robot into a third 
room adjacent to the second. The system notes that MACR0P1 is com¬ 
pletely contained in MACR0P2 and therefore erases MACR0P1. 
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FIGURE 4 EXAMPLE PROBLEM 3 


Task Statement: 

INHOOM(RO BOT,RPDP) 

Generalized Plan: 

MACROP3(par24, par59, par82,par32,par42) 

GOTOD(par24) 

GOTHRUDR(par24,par42) 

GOTOD(par59) 

OPEN(par59) 

GOTHRUDR(par59,par82) 

Comments: 

The third problem entails taking the robot from one room through 
a second room and into a third room, with the added complication 
that the door connecting the second and third rooms is closed. 
STRIPS first decides to use MACR0P2 with the box-pushing sequence 
edited out and then finds that the door must be opened; to get the 
robot next to the closed door, a head of MACR0P2 is selected with 
the box-pushing sequence again edited out. After formation of the 
plan to go to the door and open it, the PLANEX scan observes that 
only the final operator of the first relevant instance of MACR0P2 
is needed to complete the plan. The generalized plan takes the 
robot from one room into an adjacent room, then to a closed door 
in the second room, opens the closed door, and then takes the robot 
through the opened door into a third. 
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FIGURE 5 EXAMPLE PROBLEM 4 


Task Statement: 

NEXTTO(BOX1,BOX2) A NEXTTO(BOX2,BOX3) 

Generalized Plan: 

MACROP4(par37,par80,parl02,parl23,parl34,par57,par59) 

GOTOD(par37) 

GOTHRUDR(par37,par5 9) 

G0T0B(par80) 

PUSHB(par80,par102) 

GOTOB(parl23) 

PUSHB(parl23,parl34) 

Comments: 

The fourth problem requires that three boxes be pushed together, 
with the robot beginning in a room adjacent to the room containing 
the boxes. A head of MACROP2 is used to get the robot into the 
room with the boxes and to push two of them together; the box¬ 
pushing sequence of MACROP2 is used to complete the plan, again 
with the assistance of the PLANEX scan. The generalized plan takes 
the robot from one room into an adjacent room, pushes one box to 
a second box, and then pushes a third box to a fourth box. 
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FIGURE 6 EXAMPLE PROBLEM 5 


Task Statement: 

NEXTTO(BOX1,BOX2) A. NEXTTO(BOX3,BOX4) 


Generalized Plan: 

MACROP5(par44,par87,parl51,par208,par237,par265,par294,parl80, 
parl30,par64,par66) 

GOTOD(par44) 

GOTHRUDR(par44,par66) 

GOTOD(par87) 

OPEN(par87) 

GOTHRUDR(par87,parl30) 

GOTOD(parlSl) 

GOTHRUDR(parl51,parl80) 

GOTOB(par208) 

PUSHB(par208,par237) 

GOTOB(par265) 

PUSHB(par265,par294) 

Comments: 

The fifth problem requires the robot to go from one room into a 
second room, open a door that leads into a third room, go through 
the third room into a fourth room, and then push together two pair 
of boxes. The plan, which is formed by combining all of MACROP4 
with all of MACR0P3, is well beyond the range of plans producible 
by STRIPS without the use of MACROPs. Note that although MACROP4 
was created by lifting a plan that pushed t! ree boxes together, it 
has enough generality to handle this form of a four-box problem. 
Following the creation of MACR0P5, MACR0P3, and MACR0P4 are recog¬ 
nized as redundant and deleted; hence the net result of this learn¬ 
ing sequence is to add only MACR0P2 and MACR0P5 to the system. 
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STATISTICS FOR STRIPS BEHAVIOR 
(Minutes) 






Problem 5 1 

9:13 

3:24 

5:49 

14 

14 

11 

Problem 4 

14:09 

1:43 

112:26 

51 

15 

7 

4:37 

1:37 

3 

9 

9 

6 

Problem 3 

7:03 

1:11 

5:52 

22 

11 

5 

6:34 

1:16 

5:18 

14 

9 

5 

Problem 2 

9:42 

1:28 

8:14 

33 

13 

6 

3:54 

1:32 

2:22 

9 

9 

6 

Problem 1 

mom m o 

o o o o o o 

3 

1 

2 

10 

9 

4 

3 

1 

2 

10 

5 

4 


Without MACROPS 

Total time (min:sec) 

Time to produce MACK OP 

Time to find unlifted plan 
Total nodes in search tree 
Nodes on solution path 
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The experiments we have been discussing show the use of 
MACROPs during planning. We have also run experiments with PLANEX to 
illustrate the use of MACROPs during plan execution. One such experiment 
Is shown in a film 2 that illustrates how PLANEX monitors robot task exe¬ 
cution In the seven-room experimental environment. One Interesting se¬ 
quence In this experiment shows the robot attempting to go from one room 
through a second room Into a third room. After entering the second room, 
the robot discovers that a box is blocking the doorway that leads Into 
the third room. Since PLANEX Is working with a generalized plan, the 
difficulty can be overcome by finding a different instance of the plan 
that is satisfied. This new instantiation of the plan's parameters 
causes the robot to be sent from the second room Into a fourth room and 
then into the target third room. 

C. Preliminary Specifications for a New Problem Solving System 
1. Criteria for the New System 

Although STRIPS represents a considerable advance over earlier, 
theorem-proving based problem-solvers, It still has many shortcomings. 

On recognlzJ *g those, we are faced with the option either of adding some 
additional features to the STRIPS system or of creating a new system 
designed specifically to meet certain criteria. We have chosen the latter 
course, partly because of the availability of the new language, QA4, that 
simplifies the task of writing problem solving programs of the type we 
envision. 

We would like our new problem solving and execution system to 
contain os many as possible of the following features. 
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• Plans with loops and branches 

STRIPS produces "straight-line" code only, 
although it would not be too difficult to 
modify it so that it could produce plans with 
branches. 


• Plans incorporating information gathering operators 

The new system should be able to plan to acquire 
information when it needs it to complete a plan. 
Such an ability presupposes the ability to generate 
plans having branches. 


• Procedurally defined operators 

The preconditions and effects of STRIPS operators 
are stated in a rigid format. Operator definitions 
in our ew system will be defined procedurally 
using QA4 programs. 


• Hierarchical planning 

We want our new system to l>e able to generate a 
plan at some appropriately high level and then 
expand the steps of this plan by planning at 
successively lower and lowei levels that encompass 
more and more details. 

• Ability to learn* 

The STRIPS system was able to save generalized 
versions of the plans it generated so that they 
could be ustd later in whole or pax t as components 
of new plane. We would like our new system to have 
this feature also. 

• Execution monitoring* 

The STRIPS system was able to monitor the execution 
of plans in an intelligent manner. We would like 
the new system to do as well with perhaps a less 
clear-cut boundary between the planning and execution 
phases. 


* 

The STF.IPS-PLANEX system has this feature. 
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• Compatibility with Speech Understanding Systems 

Ultimately, we would like to be able to interact with 
the new system through spoken English. We should be 
able to give it commands and advice, tell it facts, 
and teach it how to perform new tasks, all by speaking 
to it. 

• Constraints 

The new system should be able to avoid getting into 
any situation that we define as "illegal. 

• Dynamic environments 

We want the new system to perform well in an environ¬ 
ment in which other agents of change (e.g., people) 
are operating. 

• Ability to deal with time 

The new planning system should be able to use the 
concept of time so that it can perform tasks such 
as "Go to Room 21 at 3:00 p.m." 

. Ability to interact with people 

Besides its ability to understand speech, we want 
the new system to know some simple facts about the 
people around it, including rudimentary information 
about their capabilities, locations, and goals. We 
would want the system to know, for example, whom to 
ask for advice about the location of some other person. 

• Ability to work on conjunctive goals 

STRIPS had an unsophisticated ibility to achieve 
two or more simultaneous goals. We would like the 
new system to be able to generate the appropriate 
plans for highly Interdependent goals. (Goals A 
and B are interdependent if the appropriate plan 
to achieve A and B entails taking some (but not all) 
of the steps toward goal A and then taking at least 
some steps toward B before finishing A.) 

For the past few months we have given a great deal of attention to how 
these features might be achieved. People working on the development of 


*The STRIPS-PLANEX system has this feature. 
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the QA4 language have already written some simple illustrative programs 
that generate plans incorporating loops, branches, information gathering 
operators, constraints, and conjunctive goals. 4 Mostly these abilities 
were illustrated one at a time in separate programs, and there are still 
large questions concerning their synthesis in a single system. The QA4 
programs ignored the important question of plan-executing processes and 
how they interact with plan-generating processes. Nevertheless, work to 
date makes us feel reasonably optimistic about our ability to program a 
system containing most of the features mentioned. 

We have also worked quitj hard on the matter of hierarchical 
planning. Cor ideas on this subject are still tentative, but we think 
the matter to be important and it is discussed in some detail in the 
next section. 



The ability to plan in a hierarchical fashion has obvious 
advantages. The general idea is simple: first a plan consisting of a 
few macro-steps is roughed out in some abstract space. Then the steps 
of this high level plan are expanded in a little more detail, and so on 
until the plan is complete at whatever level of detail is defined by the 
available motor actions. We have explored two somewhat different means 
of implementing such a hierarchical planner. In the first method, each 
of the preconditions of each planning operator is given a "criticality 
number." The criticalities might be assigned initially or they might be 
computed functions of the predicates and arguments involved. Predicates 
with high criticality ure important even at the highest levels of planning. 
Thus the precondition PUSHABT.E(BOX) in a PUSII(BOX) operator would have 
high criticality, primarily because the system does not have an operator 
that can change the value of PUSHABUS. Whether or not a box is pushable 
is not merely a detail that can be faced at a low planning level. 
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Predicates with low criticality can be ignored at the highest levels of 
planning. Whether they are true or false is immaterial since they can 
(presumably) be easily given the desired value at a lower planning level. 
Thus at the highest planning level the predicate dealing with the robot's 
heading, say, can be ignored since it can easily be set correctly by 
lower levels. 

Using these criticality numbers, we can generate plans hierar¬ 
chically in a straightforward manner. First the planner generates a plan 
with a high criticality threshold. That is, only those preconditions 
whose criticality exceeds a rather high level are considered. Such a 
plan consists of a sequence of operators each of which has rather weak 
preconditions. Next we could, say, pick the first operator in the se¬ 
quence (whose weak preconditions are satisfied in the initial model) and 
lower the criticality threshold on its preconditions. Some more planning 
steps may now be necessary to satisfy the somewhat strengthened precon¬ 
ditions. In this manner we generate a plan that gradually considers 
more details until finally the preconditions with the lowest criticality 
numbers are also considered. 

If at any stage the preconditions of an operator in the plan 
cannot be satisfied at a lower criticality threshold, the plan at the 
next highest level is rejected and an alternative must be found. 

There are important questions here regarding the order in which 
steps in a plan at a given level ought to be expanded to lower levels. 

An interesting special case might be called FIFE (first in, first expanded). 
In the FIFE mode, the first step in the high level plan would be expanded 
at the next level of detail. Then the first step in this expansion would 
be expanded, and so on until finally the first step corresponded to an 
executable action. Here we face a choice. Do we execute this operator 
and then continue with FIFE or do we continue expanding some or all of 
the rest of the plan before any executions are allowed? 
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Another important question concerns the level at which new 
planning is to be done as we gradually lower the criticality threshold. 
Suppose, for example, that the criticality threshold is at some inter¬ 
mediate level and we are testing the preconditions of one of the operators 
in the plan. If one of these preconditions is not satisfied, should we 
generate a high level plan to satisfy it or should we generate a plan of a 
level corresponding to the current setting of the criticality threshold? 
Our current opinion is that any new planning activity always ought to 
occur at the highest level. More details about this particular method 
are contained in a memo by Earl Sacerdoti. 5 This technique is now being 
implemented as an addition to the STOIPS-PLANEX system. 

A second method by which hierarchical planning can be accom¬ 
plished entails writing separate operators for different pltnning levels. 
For example, we might have the following hierarchy of operators for 
achieving the predicate INROOM(ROBOT,ROOM): G0T0RM1, GOTORM2, and GOTORM3. 
The highest level operator, G0TGRM1, would have rather mild preconditions, 
such as, say, a test for the existence of the target room. The next 
operation, G0T0RM2 would have somewhat more restrictive preconditions, 
say that the robot must be in a room adjacent to the target room. Finally 
G0T0RM3 might insist that the door between the target and adjacent rooms 
must be open. 

First a plan would be generated using only high level operations. 
Then each of the operators in this plan beginning, say, with the first 
would be replaced by its next lower level operator and plans (at the 
highest level) would be generated to achieve its preconditions, and so on. 
This procedure would work very much like the one using criticality numbers, 
except that there is now no necessity that the preconditions of a high 
level operator be a subset of those at lower levels. 
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There are many details to be worked out before we can begin 
implementing a hierarchical planner. We haven't quite decided yet, for 
example, how to administer the search process in generating hierarchical 
plans. We desire an executive that can resume work on a possibly incom¬ 
plete plan at a higher level should some step in a lower level plan run 
into trouble. Communication between levels is also a problem. We might 
desire that information discovered by a lower level planner be available 
for use in generating alternative high level plans. 

Perhaps the most difficult of our tasks will be to integrate a 
hierarchical planning feature successfully with all the other features 
we mentioned in the last section. In particular, we have had some dif¬ 
ficulty in deciding how a hierarchical planner and a plan execution system 
ought to interact. 

D. Multirobot Experiments 

In parallel with our work on a new problem solving system, we are 
planning some experiments using STRIPS and the present robot vehicle, 
SHAKEY, in conjunction with a Unimate arm. (A Unimate is a commercially 
available, fixed, industrial manipulator that is being used in our 
laboratory for experiments in the application of techniques in artificial 
intelligence to industrial automation.) Use of both SHAKEY and the 
Unimate will allow us to explore some interesting problems concerned with 
multirobot cooperation. Three fundamental types of experiments are 
envisioned: 

(1) Those in which each robot is viewed by the computer 
as an independent motor device exclusively under its 
control. 

(2) Those in which each robot is strictly autonomous and 
operates effectively in parallel by time sharing the 
same computer facility. 
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(3) Those which are a mixture of the first two in which 

one robot plays the roll of master and the other slave. 

Type (1) experiments would not require the duplication of any robot 
software on the computer, but merely the addition of a new set of opera¬ 
tors for the Unimate. The only aspect of robot communication protocol 
that is qualitatively different from the one-robot case is the fact that 
the two devices can operate asynchronously and thereby possibly physically 
interfere with one another, if careful attention is not paid to the se¬ 
quence and timing with which operations are carried out. 

Our interest in types (2) and (3) experiments stems in part from 
our desire to begin considering worlds containing large numbers of robot 
devices. In such a world, it would not be reasonable to assume that all 
robots were controlled by a single computer. Instead, we would assume 
autonomous robots that might be called together by a human to perform a 
task. The human would not, presumably, want to specify how the task is 
to be subdivided among the robots; the robots must figure this out among 
themselves. 

As a matter of experimental convenience, we will use only two 
robots (Shakey and the Unimate) and a single computer. Hence, our type 
(2) experiments require the duplication of certain subroutines and model 
structures in computer memory to allow each robot to maintain its identity. 
Provision must be made in each robot's model for the current state, goals, 
and potential capability of its counterpart, including the ability to 
communicate. By definition, neither robot can have direct access to the 
other's model data; each must carry out a symmetric dialog to discover 
the other robot's intentions. The distinction between "knowledge" and 
"belief" is now important, in contrast with type (3) experiments in which 
the master robot is assumed by definition to know precisely what the 
slave knows. 
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A possible experiment might be for Shakey and the Unimate to be 
given a joint task of turning a box upside down and moving it to a distant 
location. This tayk would capitalize on the unique ability of the 
Unimate to turn boxes upside down and the unique ability of Shakey t 
move them around. In a type (2) experiment, both robots would need to 
communicate with one another according to some fixed protocol in order 
to develop jointly a common plan of action to accomplish the goal. This 
simple type of experiment can be easily embellished by adding disjoint 
(possibly conflicting) parallel goals for each robot. Under more stringent 
constraints the difficulty in establishing and maintaining a dialog can 
be made arbitrarily complex. 
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Ill PERCEPTION 


A. Introduction 

In the p tst year we have formulated and refined a new approach to 
achine perception intended to overcome the major limitations of existing 
systems and to achieve useful real-world perceptual capabilities. This 
section presents, in considerable detail, both our design for a system 
for analyzing perceptual data and a research methodology for implementing 
the system. 

The goal of most vision research has been to describe simple geo¬ 
metric environments in an exhaustive, bottom-up fashion. Unfortunately, 
many crucial perceptual issues—such as information overload, generality 
of perceptual strategies, suitable representations for real-world objects, 
and segmentation of textured objects from the background—do not arise in 
this problem domain. Consequently, that goal has proved largely self 
defeating, leading to strategies and systems that could not be extended 
to cope with richer environments. For example, the absence of natural 
perceptual redundancy and context meant that each object could only be 
recognized in terms of a completely articulated boundary shape descrip¬ 
tion. Sach descriptions, while useful, are difficult to obtain for many 
real world objects. Moreoever, detailed shape is often not the most 
appropriate distinguishing characteristic. The emphasis on shape also 
demanded unreasonable sensitivity and reliability from the initial bound¬ 
ary extraction routines. This undirected sensitivity further limited 
the systems to textureless objects and background in order to avoid over¬ 
whelming the scene analysis stage with irrelevant edge detail. 
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1. Perceptual Philosophy 


In contrast to the above, '"e have chosen as our primary research 
objective the task of finding specified objects in complex real-world 
environments. This objective is consistent with the information require¬ 
ments of our robot, which typically is concerned only with locating 
particular objects involved in its current task. 

We proceed on the premise that there exist easy ways of "seeing" 
things. The redundancy of visual cues and contextual constraints allows 
a desired object to be distinguished from others on the basis of a small 
subset of the available features. To illustrate, a human can usually be 
distinguished from other contents of an office simply as a moving blob 
whose surface area is larger than two square feet. 

Moreover, even when such complicated features as shape are 
required, then if the context is suitably limited, a simple distinguishing 
measure related to that feature may suffice to resolve remaining recog¬ 
nition ambiguities. Thus, either the relative position of the center of 
gravity or the presence of sharp corners should be sufficient to dis¬ 
tinguish the shape of a person from that of Shakey. 

2, Design Considerations 

The above point of view suggests the following design considera¬ 
tions for a perceptual system. 

a. Perception as Problem Solving 

Perception should be construed es a problem solving process; 
the system must utilize its knowledge of the current real-world environ¬ 
ment, and of its own perceptual capabilities to plan where and how to 
look for a specified object. Specifically, ;he computer must use its 
knowledge to select features of the desired object that are both dis¬ 
tinguishing and easy to see. 
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The utility of such 'distinguishing features' is criti¬ 
cally dependent on what is known at the current stage of analysis. Hence, 
it is unreasonable to preprogram these recognition strategies, except in 
the simplest and most static environments. Moreover, a system that can 
plan its own strategy has inherent generality; it should be able to func¬ 
tion in any environment for which its knowledge base and perceptual 
primitives are adequate. 

b. Sequential Decision Paradigm 

It is usually unnecessary to examine all features of an 
object to arrive at a confident recognition hypothesis. Perception should 
proceed like the game of 20 questions. Simple descriptive attributes 
(e.g., color, motion, size) should be used initially to establish a 
limited context. Remaining ambiguities can then be resolved within this 
context, using the distinguishing components of complex attributes (e.g., 
shape, texture). 

Representation of Complex Objects 

Representations for describing the shape and texture of 
complex real world objects are not yet available. Indeed, symbolic 
descriptions may not even be feasible. However objects can be represented, 
for purposes of discrimination, in terms of the many crude measures of 
shape (e.g., perimeter-squared/area, length/width of bounding rectangle) 
and texture (e.g., statistics, power spectrum) that are available. The 
system should use the simplest representations sufficient to distinguish 
the object of interest in a given context. 

d. Multiple Sensors 

The likelihood of finding suitable surface attributes for 
distinguishing a given object increases with the number of independent 
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sensory modalities. Furthermore, simple discrimination in each of 
several sensory modalities should be a cheaper, more reliable alternative 
to using more detailed descriptions in a single modality; color and 
surface orientation (obtained directly from range data) are substantially 
easier to process than shape and texture, the principle means of dis¬ 
tinguishing objects in gray-scale images. 

e. Goal Directed Feature Extraction 

A key problem in doing perception by distinguishing fea¬ 
tures is to extract reliably the features. One lesson that has been 
repeatedly learned in a decade of vision research is that feature extrac¬ 
tion cannot be adequately performed bottom-up, as a preliminary to a 
knowledge-based interpretative process. 

Rather, the system must be integrated so that recognition 
strategies are based on knowledge of which features are easy to extract 
in a given environmental context. The low level routines should then 
concentrate on extracting those specific features, guided again by 
knowledge of their distinguishing attributes. For example, regions can 
be grown on the basis of surface attributes that are known to be both 
homogeneous over the goal object and distinguished from those of other 
nearby surfaces, previously found or anticipated in the environment. 

f. Incremental Acquisition of Knowledge 

The substantial amount of ad hoc world knowledge required 
to plan perceptual strategies is most reasonably acquired in an incre¬ 
mental fashion. The system should thus be designed to request additional 
information from a user at times of failure, indecision, or on encounter¬ 
ing a new object, and to incorporate this information immediately in a 
revised strategy. The new strategy establishes empirically whether the 
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current description of an object is sufficient to distinguish it from 
others already known. 


3. Organization of this Section 

We have completed the conceptual design of a perceptual system, 
incorporating each of the features listed above. 

In Section III-B we present an overview of the perceptual 

strategy. 

Section III-C describes a system design, modeled after utility 
theory, for planning and executing this strategy. 

Section III-D outlines our planned approach for utilizing high 
level world knowledge to direct low level feature extraction. 

Section III-E summarizes related work on interpreting color 
and range data, to realize a multlsensory capability. 

Section III-F concludes with our plans for implementing and 
experimenting with the system. 

B. Overview of System 
1. Scenario 

Before describing the system, let us first describe a scenario 
that has played an important role in helping refine our thinking. Its 
inclusion here serves three functions: it illustrates the intended mode 
of system operation, it provides explicit examples to clarify later 
discussions on perceptual strategy, and it establishes some concrete 
operational objectives against which our system, when implemented, will 
be evaluated. 
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a. Scenario Objective 

The scenario objective is to analyze an uncluttered office 
environment like that shown in Figure 7, using brightness, color, and 
range as sensory inputs. The basic task is to find a designated object 
(e.g., a chair). The system will designate its comprehension by outlining 
the indicated object on a grey-scale display of the scene. A second 
task is to describe the scene, which in this limited environment can be 
accomplished by commanding the system to find each of the small set of 
known objects. 


b. Knowledge Base 

The system's world model will initially contain five 
typical objects selected from offices, and three room fixtures. The 
objects are semantically constrained to appear in normal office relation¬ 
ships (e.g., chairs are not allowed to be on tables). Table 3 conveys, 
informally, the scope of information that will be available to the system. 
The given attributes and relations do not constitute complete descrip¬ 
tions, but should be adequate to distinguish among the objects using 
color and range data. (Partial descriptions are, in fact, preferred 
because they provide possibilities for generalization. The descriptions 
can always be refined interactively, should the system err on their 
account.) The descriptions also indicate the range of specificity with 
which knowledge can be provided. Thus, dimensions may be given exactly, 
corresponding to particular objects (e.g., the door and chair in Table 3), 
or with a tolerance encompassing a class of possible objects (e.g., the 
picture in Table 3). 

Though spatial relationships are expressed here as gross 
symbolic constraints (e.g., back of chair "parallel to and above seat), 
it is often more convenient, in practice, to represent such metrical and 


30 



FIGURE 7 A SIMPLIFIED OFFICE ENVIRONMENT REPRESENTING OUR EXPERIMENTAL DOMAIN 
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INITIAL WORLD MODEL 


Object 

Attributes 

Relations 

Wall 

Vertical plane 

Dimensions: > 7 ft high (except at 
door) and > 8 ft long 

Shape: rectangular, except for 
doorway 

Color: buff, homogeneous 

Adjacent and perpendicular to 
floor (below) and other walls 
(side). Adjacent to doorway 
and door. 

Floor 

Horizontal plane (height = 0) 

Shape: linear boundaries 

Dimensions: at least 5 ft of 
extent along x and y a*:es, de¬ 
limited by intersection with 
walls. 

Color: buff with white and brown 

streaks 

Adjacent and perpendicular to 
walls, door, and doorway. 
Continuous through doorway. 
Supports chair, table, basket. 

Door 

Door 

body 

Vertical rectangular prism 

Size: 3-1/2 ft wide by 7 ft high 
by 1-1/2 in. thick 

Color: brown, wood grain 

Hinged (-90 < angle < 90) to 
wall on one vertical edge. 
Adjacent to floor on bottom. 

Door 

knob 

Cylindrical prism 

Dimensions: 2 in. (diameter) by 

2 in. (length) 

Color: iliver 

On (i.c., base coplanar with) 
both wide faces of door body 
at height = 38 in. It is 5 
in. from unhinged edge. 

Chair 

Scat 

Horizontal rectangular plane prism 
Dimensions: 18 in. by 18 in. 
(horizontal plane, to?) and 18 
in. by 4 in. (vertical side 
planes) 

Supported by legs which arc 
attached at each corner of 
the bottom rectangular face • 


Height: 16 in. to center of 

gravity 

Color: tan or gray 

Width of back parallel to 
and above edge of seat. 

Back 

Vertical rectangular prism 

Holgnt: 24 in. to center of 
gravity 

Dimensions: 12 in. high by 17 in. 

wide by 1-1/2 in. thick 

Color: same as seat 
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Table 3 (concluded) 



Attributes 

Relations 

Chair 

Ixigs 

Vertical rectangular prisms 

Number: 4 

Color: gray or cocoa 

Size: 1-1/2 by 1-1/2 by 14 in. 
(height) 

Supported by floor. 

Table 

Top 

Horizontal rectangular prism 

Color: buff or gray 

Dimensions: 3 to f> ft long, 2 to 4 
ft wide, and 28 to 36 in. high 

Supported by table legs. 

Legs 

Vertical rectangular prism 

Height: 20 to 36 in. 

Supports table top at corners. 
Legs supported bv floor. 

Basket 

Vertical cylinder 

Color: gray or brown 

Hollow on top 

Dimensions: 13 in. (diameter) by 

14 in. (height) 

Supported by floor. 

Picture 

Vertical rectangular plane 

Dimensions: 8 in. to 30 in. 

(length or height) 

Multiple colors, usually in small 
regions. 

On (coplanar with) wall. 

Telephone 

Case 

Horizontal rectangular prism 
Dimensions of base: 5 in. wide, 8 
in. long, and 1 in. high 

Color: black 

Supported by table. 


Horizontal rectangular wedge 
Dimensions: S in. wide, 8 in. 

long, and 3-1/2 in. high 

Color: black 

Supported by and aligned 
with top of rectangular 
prism. 

Dial 

Cylindrical prism with multiple 
holes in end 

Dimensions: 3 in. in diameter 
by 1/8 in. thick 

Color: gray 

Centered on sloping face of 
wedge. 
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topological relationships implicitly, in the form of structural models 
(like those used for computer graphics). In our initial implementation 
we plan to use crude structural models for objects with several parts. 

A chair, for example, will be represented by rectangular prisms for seat, 
back, and legs. Such a model should convey spatial relationships in 
sufficient detail for planning perceptual strategies, but be simple to 
obtain and use in early experimentation. 

2. Perceptual Strategy 

The search for an object proceeds in two phases, called acqui ¬ 
sition and validation . During acquisition, the multisensory image is 
sampled for characteristic surface attributes of the desired object. If 
a sample satisfying all criteria is found, a sequence of top-down vali¬ 
dation tests determines whether the acquired sample does, in fact, belong 
to the desired object, or to another object with similar surface charac¬ 
teristics. Each of these search phases will now be discussed in more 
detail. 

a. Acquisition 

The selection of acquisition attributes for sampling is 
based on such considerations as: 

(1) Criteriality—the attribute should invariably 
be associated with the object (e.g,, the 
floor is always horizontal). 

(2) Distinguishability—the attribute should not 
also be characteristic of other objects 
expected in that context. 

(3) Measurability—the attribute should be reliably 
obtained from simple, localized processing of 
sensory data. 
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These criteria are applied to the characteristics of each 
surface of an object to ascertain the best set of surface attributes. 

Let us illustrate our approach with the goal of finding a chair. The 
initial system implementation will include primitives for testing height, 
color, and local surface orientation. The discussion on multisensory 
data interpretation in Section III-E suggests that height will be the 
most easily measured attribute, followed by hrrizontal orientation, 
color, and vertical orientation. Refer now to the description of a chair 
in the basic world model (Table 3). Height and horizontal orientation 
are both criterial to the seat of a chair, and together are unique in 
that environment. The color alternatives make that attribute less 
criterial. Since vertical orientation, the primary attribute of the 
back of the chair, is both less distinguishing and more difficult to 
test, a chair should be sought by sampling for a height of 18 inches, 
and checking successful points for local horizontal orientation. 

Table 4 summarizes the anticipated best acquisition 
attributes for each object in the scenario. If a search fails to satisfy 
one of these criteria, the system can then select alternative attributes 
based on plausible explanations for the failure. For example, the seat 
of a chair might not be visible if the chair were viewed from behind. 

Thus, failing to find a horizontal sample of appropriate height for a 
seat, the system could next look foi characteristic attributes of a back 
support. 

Sampling may be localized to specific areas of the scene 
on the basis of objects already recognized. For instance, a wastebasket 
need only be sought in areas bordering the floor region. In fact, the 
planning algorithm discussed later will have the option of looking first 
for a contextually related object that is larger or other ise easier to 
find, in order to localize the desired object with reduced total search 
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Table 4 


ATTRIBUTES AND CONTEXTURAL RELATIONS FOR ACQUISITION 


Object 

Acquisitloi. Attributes 
(Test in order shown) 

Wall 

Color - buff; orientation - vertical 

Floor 

Height 0 in.; orientation - horizontal 

Door 

Color - brown; orientation - vertical 

Chair 

Height ss 18 In.; orientation - horizontal; 
color - tan or gray 

Table 

Height 20 to 36 In.; orientation - horizontal; 
color - buff or gray 

Basket 

Color - gray or brown; orientation - vertical; j 

context - on floor 

Picture 

Buff/nonbuff boundary; context - on wall 

Telephone 

Color - black; orientation - vertical or inclined; 
context - on table 


effort. Thus, to find a telephone, It might pay first to find a desk. 
Humans often pursue a similar style of perceptual search. 

b. Validation 

The validation process begins by checking the global 
attributes of the surface surrounding the acquisition sample. This 
surface Is extracted by grouping the acquisition sample with proximate 
samples having similar attributes. The resulting region is then checked 
for appropriate size, shape, global uniformity and so on. 

The global attributes help distinguish the desired surface 
from surfaces with similar acquisition attributes, belonging to other 
objects In the knowledge base. We Intend to rely on the consensus of 
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several crude, individually tolerant tests related to these attributes, 
rather than actually to extract a detailed description of any particular 
one. For example, the ratio of perimeter squared to area, or of length 
to width, might suffice as a representation of shape. The system will 
select the simplest of such tests that provide adequate discrimination. 

In addition, certain critical dimer<ions (e.g., length, width, area) will 
be measured absolutely (using rar.ge data) to reduce the likelihood of an 
unknown object slipping through the explicit discriminations. 

Unfortunately, past experience suggests that a test is 
as likely to fail because of errors in region growing as because the 
wrong object was acquired. We hope to overcome the unreliabilities in¬ 
herent in the region growing process by tightly controlling it with 
feedback from the evaluation objectives. This will be discussed in 
section III-D. 

If the confidence remains indecisive after surface evalua¬ 
tion, or if specific ambiguities remain, then validation continues by 
seeking additional parts of the object (e.g., the back of a chair, the 
knob of a door). These subobjects can themselves be acquired and vali¬ 
dated by using the basic programs recursively. 

Additional confidence can be obtained by finding other 
objects in appropriate context-., al relationships, such as, "support" 

(e.g., telephone on table), adjacency (e.g., wall and door), functional 
(e.g., hammer near nails), and the like. 

Validation proceeds as a sequential decision process that 
terminates when a definitive level of confidence has been reached. After 
each feature is sought a decision must be made whether to accept the 
original acquisition hypothesis, to reject it (and resume sampling) or 
to continue tie validation process. This decision will depend on param¬ 
eters of required confidence and allocated budget, reflecting the global 
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importance of this current goal to a higher level robot strategist. The 
decision to terminate validation also depends, in part, on the degree to 
which the desired object must be localized in the scene, since the more 
features of an object that have been found, the more tightly location 
and orientation are constrained. 

C. Conceptual Design 
1. Introduction 

We would now like to describe in detail a system designed to 
plan and execute the kind of perceptual strategy described above. The 
design goal is to find a specified object with minimal cost (i.e., compu¬ 
tation time) within an allotted budget, while maintaining a required 
level of reliability (i.e., effectiveness). This goal requires that the 
system utilize all available information about the object, the general 
environment, detector routines, and current sensory information. 

The system will generate a planning tree representing alterna¬ 
tive ways to acquire and then validate an object. Initially, only paths 
emanating from the most promising acquisition features will be explored 
in detail. Moreover, less detailed cost and reliability estimates will 
be used in calculating the utility of validation features, to simplify 
planning. The system will then proceed to execute the most promising 
path based on its Initial information, planning in greater detail as the 
strategy successfully progresses to the validation phase. On the other 
hand, if a strategy falls, the system can utilize the new information it 
acquired during execution to choose an alternative execution path or 
perhaps to resume planning a previously unpromising approach. Since 
knowledge is constantly acquired during the perceptual process, planning 
and execution will be tightly interwoven. 
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The planning/execution interaction will be coordinated by the 
GOAL MONITOR (GM). The task of the GM will be to account for the effects 
of the success or failure of one subgoal on the rest of the subgoals in 
the planning tree awaiting execution. Thus, achieving me subgoal could 
remove others from consideration, could restrict the search space for 
others, or might change the importance of satisfying them. The effect 
of goal monitoring is to use the most current information available to 
dynamically coordinate the planning and execution. This process retains 
the flavor of utility theory (i.e., always do whatever is currently most 
promising), but many practical drawbacks of the formal theory can be 
avoided. In particular, the combinatorial difficulties entailed in 
global optimization will be minimized by considering many processes to 
be independent until much later in the planning and execution sequence. 
Since we do not have accurate utility estimates, we cannot (and do not 
wish to) look too far ahead in our planning. 

The satisfaciion of our goal proceeds in three main steps: 

(1) The plunning phase, which may include the generation 
of alternative paths to the goal. For example, to 
find a telephone either look for the telephone 
directly, or look for a desk first and then look 
for a telephone on the desk. The initial planning 
will be concerned with acquisition of the object. 
Additional considerations, such as validation and 
refinement, will then be examined for the most 
promising paths. 

(2) The acquisition phase, where some features of the 
current best subgoal are used to attempt to locate 
it in the scene. For example, if It is determined 
that the best way to find the telephone is via 

the desk, it may be decided to attempt to acquire 
the desk on the basis of a large surface of a 
certain color. 

(3) The validation phase where, having found the initial 
acquisition feature, the system attempts to verify 
the acquisition through other parts or contextual 
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relationships. For example, having located what 
appears to be the top of the desk, the system may 
attempt to show that it really is the desk by 
finding the sides. 

2. Definitions 

Before proceeding, we would like to define certain parameters 
and explain how we plan to use them. We will assume that the input to 
the system is a list of goals of the find type and a budget allocation. 
Associated with each goal will be the interest (I) that the robot execu¬ 
tive has assigned. The interest is a number from 0 to 1, and is a measure 
of the importance of the goal to the executive. The I values will be 
used to set the confidence with which the goal must be achieved. From 
the set of I values, the system will compute the normalized interest (NI) 
for each goal. The NI of a goal will be the I of that goal divided by 
the sum of the I's for all goals. 

The budget is split into two parts: a fixed percentage is to 
be used for planning, and the rest for execution. The allocation of 
planning budget will be made in proportion to the NI of each goal. This 
ensures that the effort expended on a goal is proportional to its overall 
interest to the system. After planning for a goal, any remaining budget 
for that goal is reallocated among waiting goals. The execution budget 
allocation will be based on the results of the planning, and will change 
dynamically during the course of execution and replanning. 

We will take cost to be a measure of the processing time re¬ 
quired for completion of a task to a certain level of reliability . The 
reliability is the probability that the results returned for a given 
task are correct—we will use the term confidence synononously with 
reliability. 
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We will use FOM to represent a computed figure-of-merit for 
each node in the planning tree. This will take into account such things 
as the expected cost of achieving that subgoal with a required confidence, 
and its importance to higher goals. We will distinguish two factors 
used in computing the FOM: a direct FOM (DFOM) and an indirect FOM (IFOM). 
The DFOM measures the ease with which a feature (a tip node of the tree) 
can be acquired directly—that is, by looking for the feature in a pic¬ 
ture. The IFOM measures the utility of the contextural information pro¬ 
vided by one object to the goal of locating another object. The IFOM 
is a property of a relation between two objects (and is therefore asso¬ 
ciated with a branch of the planning tree). It gives an estimate of how 
close you are to satisfying a higher level goal given satisfaction of a 
lower one. The FOM of an Intermediate node of the tree will be generated 
by backing-up the DFOMs and IFOMs of the tips and branches of the tree 
beneath it. The values of the IFOMs, DFOMs, and FOMs will all be within 
the interval [0,1]. The FOM of a goal will be used to decide which is 
the best thing to pursue next during planning and execution. 

We will also be referring to three data bases. The first is 
the VISUAL MEMORY (VM), which will contain permanent perceptual knowledge 
about objects, parts of objects, visual features of parts, and relation¬ 
ships between them. The information in VM will be obtained mainly by 
interacting with a human operator when new objects are encountered or 
when a strategy based on existing knowledge fails. Some details of this 
interaction appear in Section III-F. 

The second data base is a short term world model (STWM) in 
which the results of current processing are represented as instances of 
the items in VM. An accompanying coordinate transformation gives the 
actual location and orientation of each instantiation. There is also a 
low level data structure for representing regions, boundaries, and simi¬ 
lar partially digested information about the scene. 
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The third major data base is the detector library, containing 
both perceptual operators and knowledge about their application. This 
knowledge includes estimates of cost and reliability, as well as prag¬ 
matic advice on the use of given operators in particular contexts. 


3. Planning 

We would like to give an overview of the initial stage of 
planning by describing in general the course of planning. For the moment 
we will assume the existence of values of DFOM and IFOM at each stage, 
later discussing criteria that may be used in their computation. 

a. Plan Generation 

First we need to describe the hierarchical structure of 
visual memory, illustrated in Figure 8. Objects have parts, and parts 


OBJECT 



FIGURE 8 VISUAL MEMORY HIERARCHY 


have features. Features are the visual attributes that can be found 
directly in an image. Examples of features include color, surface 
orientation, texture, size, and shape. The basic concept of part is 
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a surface with a collection of relatively homogeneous features. (Parts 
may themselves have parts, but we will not consider that here.) 

Let us assume that we are trying to satsify a single 
goal, FIND 0BJECT1, and have already allocated part of the budget to 
that goal for planning. Let us also assume that we have an FOM value 
for the object to be found that we will use as a cutoff value in planning 
(the target FOM). We will evaluate the advisability of attempting to 
find an object directly versus finding it indirectly at each stage of 
the plan. The plan we ultimately devise will be the "best*’ branch of 
the planning tree. 

The first level of the planning tree for finding 0BJECT1, 
is merely the object-feature hierarchy detailed above, and is illustrated 
in Figure 9. 0BJECT1 has two parts, PI and P2. In turn, PI has features 

1 

IFOM (OBJECT), P ( ) 


IFOM IP,. P ( ) 

SA-1M0-B 

FIGURE 9 PLANNING TREE FOR FINDING OBJECT 1 

FI, F2, and F3. P2 has F4 and F5 for its features. The IFOM (OBJECT, 
PART) is always 1, since finding a part of an object brings you all the 
way to that object. Likewise, the IFOM (PART, FEATURE) is also 1. We 
will write the IFOM values on the branches (see Figure 9). 
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The FOM of a tip node will be the DFOM of the feature at 
that node. The backed-up FOM ilue of a nontip node will be computed 
from the FOMs of the nodes cirttrly below it in the tree, and the IFOMs 
of these nodes with respect to it, by taking the maximum of the set of 
values of IFOM(PARENT,OFFSPRING) * FOM(OFFSPRING). The FOM of a node can 
therefore be considered to be the FOM of the planning tree from that node. 
Thus, the current FOM(OBJECTl) is equal to 

maxfFOM(Pl) * IFOM(OBJECT1,PI) , F0M(P2) * IF0M(0BJECT1,P2)} . 

Now, we have 

FOM(Pl) = maxfFOM(FI) * IFOM(Pl,Fl) , F0M(F2) 

* IFOM(PI,P2) , FOM(F3) * IF0M(P1,F3)} 

or, since each IFOM is equal to 1, 

FOM(Pl) = maxfDFOM(Fl) , DFOM(F2) , DFOM(F3)} 
and similarly 

FOM(P2) = max[DFOM(F4) , DFOM(F5)} . 

Therefore, 

FOM(OBJECTl) = max[DFOM(FI) , DFOM(F2),..., DF0M(F5 )}* . 

Suppose now that the FOM(OBJECTl) is not high enough (i.e., 
it is not greater than the target FOM value), and suppose also that we 
have planning budget left. We would like to consider looking for a re¬ 
lated object to help us find 0BJECT1. We wish to consider IFOM(OBJECTl, 
obj) for all objects, obj contexturally related to 0BJECT1. We can im¬ 
mediately reject any object, obj, for which IFOM(OBJECT1,obj) is less 
than the current FOM(OBJECT1), since even if FOM(obj) is 1 (the maximum 


Those readers familiar with probability theory may recognize some simi¬ 
larities between the FOM computations and the calculation of probability 
of two dependent events. 
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possible), it cannot improve F0M(0BJECT1). (In practice, we will con¬ 
sider objects whose IFOM with respect to the original object are almost 
as high as the FOM of the original, since these estimated utilities are 
likely to change somewhat during execution. 

Now, suppose two objects, 0BJECT2 and 0BJECT3, satisfy 
the test on IPOM. We grow the tree shown in Figure 10, and allocate 



FIGURE 10 EXTENDED PLANNING TREE FOR OBJECT 1 


a percentage of the planning budget remaining, based on the relutive IPOM 
values, to each of the two new objects. Using FOM'(0BJECT1) as the new 
FOM value, we have 

FOM'(OBJECTl) = max[FOM(0BJECT1), 

IF0M(0BJECT1,0BJECT2) * F0M(0BJECT2) , 
IF0M(0BJECT1,0BJECT3) * F0M(0BJECT3)} 


with FOM(0BJECT2) and F0M(0BJECT3) calculated as above. 


As each new node is added to the tree, a backward pointer 
to its parent is stored with it. These pointers will be used by the 
GM during execution to recalculate FOMs based on success or failure of 
subgoals. 

Whenever we start planning to find an object, we insert a 
pointer to the corresponding branch of the planning tree on the property 
list of that object in VM. Accordingly, we do not have to plan for 
finding an object more than once. Whenever we reach an object in the 
course of planning, we increment a counter associated with that object. 
When we are finished, we have a count of the number of times an object 
has appeared in the plan. We will use this count in the execution of the 
plan in order to give higher weight to subgoals that can help achieve 
several goals. 

This mechanism allows a rudimentary form of learning. 
Previously successful plans would remain on the objects property list in 
VM with an accurate FOM value and be available for subsequent use. The 
system will be designed to always check each node for an existing strategy 
before planning anew. If a plan is available, the FOM stored with that 
plan is used in computing FOMs for higher nodes. After checking the 
subgoals in the plan against other subgoals, we will treat the new branch 
of the tree just like any other. Property list pointers are also a con¬ 
venient way to incorporate experimental, manually generated strategies. 

We continue the planning process until one of several 
possible termination criteria is met: Either we run out of budget, 
there is no way to improve our current FOM, or the current FOM is good 
enough. Whenever we "terminate" a planning branch, we really only suspend 
it. Thus, if we decide later that perhaps it should have been considered 
further, we can start from where planning was discontinued. This will 
be important should we terminate a subgoal because of budgetary limits, 
and then return to that subgoal along another branch of the tree when 
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we have more budget to allocate to It. The process ends returning the 
best FOM calculated and the plan that produced it. We will repeat this 
process lor any remaining goals on our list. 

At this point we have produced a plan for finding an object 
based on information on hand, and some fairly crude estimates of DFOM 
and IFOM. We have computed an FOM for the desired goal based on the 
apparent best choice of features for acquisition at all nodes. If there 
is not a clear best choice, we can continue evaluation of the most promis¬ 
ing branches, using somewhat more sophisticated (and expensive) criteria. 
We will describe this in detail later. 

b. Criteria for FOM 

We would like to describe some of the criteria to be used 
in the computation of the figures of merit, DFOM, and IFOM. The IFOM is, 
in essence, a measure of how spatially constraining one object is on 
another. It will take into account the physical relationship of one 
object to another, the probability with which this relationship holds, 
and the amount of effort required to localize the constrained object 
given the constraining one. For the parameter to be meaningful, the 
computation must take into account specific information available in the 
VM. For example, we know that the wall is always adjacent to the floor, 
and that given the wall's location, the amount of work required in finding 
the floor is not very great—just scan down the wall looking for a cer¬ 
tain kind of discontinuity. This means that IFOM(FLOOR,WALL) will be 
fairly high. Since the floor is not always adjacent to the wall (there 
may be doors), and since it may not always be easy to scan along the 
floor to find the wall, IFOM(WALL,FLOOR) will not be as high as IFOM- 
(FLOCR,WALL). Let us look at the example of a telephone and a table. 

Given a certain table (say TABLED, where we know that there is always 
a telephone, the IFOM(TELEPHONE,TABLE) will reflect the work that may be 


required to localize it >n TABLE1—this will not be very much, so the 
IFOM value will be high If we know further that the telephone is always 
in a certain location on the table, then IFOM(TELEPHONE,TABLE) will be 
even higher. However, suppose we know that there is a single telephone 
on some table in a given room, but that there are three tables in the 
room. The probability that the telephone is on any given table is only 
one-third. The value of IFOM(TELEPHONE,TABLE), reflecting this proba¬ 
bility, will thus be less than IFOM (TELEPHONE, TABLED. 

The DFOM indicates the advisability of a direct visual 
search for a feature. In Section III-B we summarized the most important 
considerations entailed in selecting acquisition attributes. The first 
consideration was criteriality, the likelihood of the feature being 
present given that the object itself is. If telephones are always black, 
then black is a criterial feature; but, if telephones come in several 
colors, then color becomes less criterial. The second factor concerned 
the uniqueness of the featuie in the world; we prefer features that do 
not also characterize many other objects likely to be present. 

The DFOM must reflect the degree to which prospective 
features satisfy these two prerequisites. It must also reflect the cost 
per sample of testing for the feature and the expected number of samples 
to be examined. The second factor is primarily a function of the size 
of a feature and the degree of localization within the scene imposed by 
prior knowledge. We will use whatever spatial constraints we have on 
the object to "window" the picture (i.e., to limit the area that must be 
searched for the feature). We will then compute the expected size of 
the object in that window. The important criteria here will be the size 
of the feature in three-dimensional space and its expected distance and 
orientation with respect to the sensor. The ratio of the expected size 
of the object to the size of the window determines both the expected 
number of points to be tested and the required sampling density. 
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The cost of testing each sample depends on the required 
detection reliability. Reliability will be estimated using simplified 
detector models that take into account common error conditions affecting 
the cost-reliability of particular modules. For instance, it is known 
that highlights, to be expected on glossy objects, will adversely affect 
the accuracy of both range and hue measurements. The cost of the cheapest 
detector with adequate reliability will be used in calculating the DFOM. 
Special purpose modules, designed to be highly cost-effective in very 
constrained tasks (e.g., a floor finder program) will be selected when 
available. 

A final consideration in selecting an acquisition feature 
is the prior availability of relevant information in STWM. If, for ex¬ 
ample, the color or surface orientation of some samples in the search 
window are already known as a result of previous goals, then the expected 
cost of another search is correspondingly reduced. The DFOM for such 
features should be increased accordingly. 

To help coordinate current goals and information already 
in hand, we will store pointers with each detector module (in the detec¬ 
tor library) to all subgoals utilizing them. This record of subgoals 
dependent on the same module, together with a history of which ones have 
been executed, will enable the planner to determine whether information 
potentially relevant to a current subgoal is available. Eventually, the 
GM may be able to use these pointers to dynamically redirect information 
not pertinent to the particular subgoal for which the information was 
gathered. 

We will conclude this discussion of FOM computation by 

reiterating the fact that the IFOM computations are based primarily on 
spatial constraints, while the DFOM computations are based primarily on 
direct visual features. However, both rely on some common factors like 
feature size and the criteriality and uniqueness of a feature conditioned 
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on the presence of other objects. IFOM is a function of three main things—— 
the type of relation between objects, the probability that the relation 
holds, ant! the amount of residual work required to localize the constrained 
object given the constraining one. The DFOM is also a function of three 
things—the criteriality and uniqueness of the feature, the relative size 
of the feature, and the utility of the detector for that feature. 

We have by now generated a plan for finding an object 
based on crude estimates of the cost of acquiring objects using their 
strongest visual characteristics. We have not taken into account any 
interactions between goals, or possible alternatives that may be available 
at a node. These factors, along with more precise estimates of FOM will 
be the subject of the following section on plan refinement. 

c. Plan Refinement 

During the refin : *nt stage of planning, we wish to take 
a closer look at subgoals that appeared promising during the initial 
phase of the planning. We intend to perform more precise checks that, 
because of cost and level of detail, were inappropriate in the initial 
stages. One important refinement is to obtain more precise localization 
of features in the picture. This localization could require relational 
chains of inference from other more precisely known objects and may 
result in changed FOM values for various nodes. These precise spatial 
bounds will be used again during execution to provide starting points 
and windows for the detector routines. 

So far, our estimated FOM for finding an object has been 
based solely on anticipated costs of direct and indirect acquisition. 

These estimates must be refined to take account of the expected cost of 
corresponding validations. If we have several strong characteristics 
of an object, then we know that it will be possible to increase our 
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confidence in that object fairly easily. If, however, the object only 
has a few characteristics, or only weak ones, then it may be more diffi¬ 
cult to validate. Thus, a goal with several good acquisition alternatives 
is more promising than one with only a single acquisition feature, and 
should have its FOM increased accordingly. 

These considerations provide a crude planning estimate of 
the ease of validation, estimating the ease of increasing confidence of 
the desired object, but without concern for distinguishing it from other 
similar ones. During the validation phase of execution, we will con¬ 
sider which of the remaining easily acquired features distinguish as well 
as confirm the object. However, this level of detail is not necessary 
at our present stage of planning. 

During the refinement of the plan, we will also consider 
the fact that several branches of a plan may have subgoals in common 
(e.g., finding the floor might be useful to finding tables, desks, and 
the wall). Since we can distribute the cost of this operation over 
several branches of the plan, we can effectively reduce the cost of the 
individual subgoals. This would increase the FOM of those branches, and 
perhaps of the parent nodes of the blanches. 

Earlier, we mentioned that a detector might produce useful 
information for subgoals other than the one for which it was activated. 

We will attempt to anticipate the likelihood of this sharing during 
planning, in order to improve the FOMs of the related subgoals. 

An important constraint on module sharing is that the 
search spaces of the related subgoals must overlap. A reasonable way to 
group subgoals satisfying this condition is to lay a coarse grid on the 
picture and note objects whose search space overlaps each cell. When 
one of the objects is found, the search space of the other objects should 
be contracted to avoid overlapping the identified portion of the scene. 
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Reducing the search space will then cause a corresponding increase in 
the FOMs of these remaining objects. 

We have now concluded the initial stage of planning to 
find an object. We have, as a result of this, a planning tree for each 
desired object. At each node of the tree, we have computed a static 
FOM value to guide us in our choice of execution alternatives. As new 
information is acquired, we expect that these values will change, and 
also that the expected execution path will change. We now proceed to 
a discussion of how this tree is used in the execution of the plan. 

4. Execution 

We would like to preface our discussion of the execution 
phase of our system by indicating the principal distinction between our 
concepts of planning and of execution. In planning, the only information 
available to the system is that which is already in VM (due either to 
manual model building or to previous processing) or in the STWM (due to 
recent execution). During execution, we will also perform "planning", 
but this will primarily use new information acquired by processing TV or 
range pictures. That is, during execution we can draw on information 
gnined by the application of detectors to images (which may be new or old). 

We will begin our description of the execution phase of the sys¬ 
tem with a brief overview. Following this overview will be a more de¬ 
tailed description, and finally an example. 

a. Overview 

The execution will start with the selection of the best 
(i.e., highest FOM) object to find, from the list of desired objects. 

From the plan for finding that object, we will select the best object 
to look for directly. We will then choose a part of that object, and 
the best feature of that part (the PRIMARY A-FEATURE) for acquisition 
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in the picture. The image will be examined by sampling coarsely, and 
looking for a sample with the appropriate characteristics. When one is 
found, the system will attempt to grow the part using the PRIMARY A- 
FEATURE and any other useful features of the part. 

After acquiring the part, we will begin to validate by 
performing some immediate tests to increase our confidence in the fact 
that we have found the desired part. Validation will then proceed by 
attempting to distinguish the desired object from other objects having 
a similar part. 

When we have sufficient confidence in the object which 
we have found, we will proceed back up the planning tree to attempt to 
satisfy the goal that had this object as a subgoal. When we have either 
satisfied all assigned ta^ks, or exhausted the budget, we terminate. 

During the course of execution, new information will cause 
conditional adjustments in our FOM estimates. We now describe how these 
adjustments may affect the flow of control. The GM will examine infor¬ 
mation obtained in the course of execution, decide which goals are af¬ 
fected, and compute appropriate adjustments to their FOMs. Whenever the 
FOM of a subgoal is modified, the new value is backed up the planning 
tree, making FOM adjustments at each level, until a top-level goal is 
reached, or until no further adjustment is necessary. 

FOM modifications will usually be backed up only when we 
terminate the current subgoal (either successfully or unsuccessfully). 
This will eliminate a tendency to flit from goal to goal, and will thus 
minimize expensive overhead functions. 

At this point, based on the updated FOMs, the planning 
budget must be reallocated and a new subgoal selected. We expect that 
when a subgoal succeeds, the goal that initiated it will usually be the 
most promising thing to do next. However, if another subgoal should 
have a significantly better FOM, it would be selected for attention. If 
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no goal appears sufficiently promising (i.e., the FOMs are all too low), 
the GM may decide that replanning is the best alternative; it can recall 
the planner, either to awaken suspended nodes or to reevaluate existing 
nodes in a new context. 

b. Acquisition 

The first step in execution will be the distribution of 
budget among the set of top-level goals similar to the resource alloca¬ 
tion for planning. In this case, however, budget will be allocated on 
the basis of both I values and FOM values for the goals. Goals will be 
allocated a percentage of the budget proportional to the ratio of the 
product of their FOM and interest, normalized by the sum of such products 
over all the top level goals. The goal with the highest allocation will 
then be selected for execution. Execution will proceed by descending 
into the planning tree and, at each branch, selecting the subgoal with 
the highest FOM value. Budget passed down from the previous level can be 
distributed among the subgoals, at each node, in a number of ways; for 
instance, in proportion to their normalized FOM values. However, in the 
initial implementation we will simply pass on the total remaining allot¬ 
ment to the best subgoal at each node. As FOM values change during exe¬ 
cution, remaining budget will be dynamically reallocated to the currently 
best subgoal. 

This initial phase of execution proceeds down the tree 
until a node is reached that represents an object that can be looked for 
directly. The best feature of the most promising part of the object will 
be chosen to be the FRIMARY A-FEATURE. The rest of the features will be 
SECONDARY A-FEATURES, or VALIDATION FEATURES. 

The scanner will be called with the PRIMARY A-FEATURE, 
and the "starting information" for the part. This starting information 
will consist of the window within the picture where we have localized 
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the part, and information about the scan, such as the order and density 
of sample points. 

The scanner will select the coordinates of the next sample 
point in the picture, and then test the local sample characteristics for 
correspondence with the PRIMARY A-FEATURE. For example, we might be 
looking for a table, using its horizontal surface as the HIIMARY A-FEATURE, 
and its color as a SECONDARY A-FEATURE. The starting information may 
cause us to examine only the lower half of the picture, and to sample it 
with a 3 by 3 operator applied at every 15th picture-point. A typical 
(default) sampling raster will be from left to right across the window 
and then bottom to top. A 3 by 3 surface will be fit at each sampled 
location to determine whether a horizontal surface of the appropriate 
height might be present. 

If the scanner is unable to find any samples with the 
appropriate characterizations, it exits with failure. The GM will then 
cause failure of all nodes in the tree that required that feature. This, 
in turn, might require the FOM values of all the nodes in the tree above 
these pruned nodes to be updated. At this point, the executive will 
choose whether to use a SECONDARY A-FEATURE as an alternate acquisition 
test, or to pursue another goal. The choice depends on the revised FOM 
values. 

If the SCANNER succeeds in finding a suitable surface 
characterization, the successful sample will be checked to see that it 
has the appropriate SECONDARY A-FEATURES. In the case of a table, if a 
sample had the right surface height and orientation, the next step would 
be to check the color. 

If the SECONDARY A-FEATURES all match, then the global 
characteristics of the part must be checked. If any of the secondary 
features (or subsequent tests) fail, the GM will still make note (in a 
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"sample table") of those characterizations that do succeed. Later, the 
FOMs of other subgoals characterized by the successful features can be 
increased. 

Now, assuming that a good candidate for the desired part 
is found, the scanner will be suspended (so that it can be restarted 
later) and the global attributes of the part extracted. We will base 
our discussion on surface parts that appear as "regions." Parts need not 
be surfaces—to locate the wall it may be wise to search for the wall- 
baseboard boundary—however, the remarks below will still apply. In 
general, the system will need to use a priori information to successfully 
extract a part. Thus, it might seek to extract a region with specific 
color, surface orientation, and textural attributes. A detailed discus¬ 
sion of goal directed region extraction follows in Section III-D. 

c. V alidation 

The initial step in validation is to compare the global 
attributes of the extracted region with those of the desired part. Some 
of the region properties that can be compared are size, shape, and uni¬ 
formity of color. This initial validation step will establish a basic 
level of confidence, indicating that what was found at least looks like 
the desired part. 

If any of these global tests fail, several options are 
open, depending on the current level of confidence in the part. First, 
if the overall confidence is low, the validation should fail. In this 
case, control returns to the scanner, which is restarted from its sus¬ 
pended state. The GM first will update FOM values of goals which are 
affected by the information obtained during the suspension. 

The second option for marginal confidence is to examine 
the reasons for failure of the test. Thus, the system might hypothesize 
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and check for occluding objects. If the hypotheses prove to he true (or 
at least plausible), then validation could continue with other features. 
Alternatively, it may be appropriate to vary the criteria used in region 
extraction and try regrowing it. We would then reenter the validation 
stage with the new region. 

The final choice is simply to ignore the failure and con¬ 
tinue validating other subgoals. We would, of course, select this option 
if the confidence were already high enough to accept the region as being 
the required part. However, it also seems reasonable to suspend, at 
least temporarily, uny error diagnosis whenever there are other reasonable 
alternatives to pursue. The need for diagnosis effectively makes the 
original subgoal very expensive. Therefore, it should be abandoned until 
all the more promising alternatives have been exhausted. 

Now that the extracted region is known to possess the 
visual attributes of the described part, it is necessary to ascertain 
whether that part, in fact, belongs to the desired object. To accomplish 
this, we will compare the object with other objects in VM having similar 
appearing parts. (These objects make up the "ambiguity set" of the de¬ 
sired object.) A set of object attributes will be selected, which dis¬ 
tinguish the desired object from others in the ambiguity set. These 
attributes will include parts and features which the desired object must 
have, and those which it specifically must not have. 

Even if the ambiguity set is empty, there is still the 
possibility of accepting an unknown object that happens to have some 
visual attributes of the desired object. To minimize the chance of such 
a mistake, it is desirable to check other parts of the desired object. 
Additional parts of the object, beyond those necessary to confirm recog¬ 
nition, may also be needed to locate boundaries more precisely. 
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The relevant criteria for deciding which feature to pursue 
in validation happen to be the same as those for selection of acquisition 
attributes—i.e., a feature that is criteria!., distinguishing, and easy 
to obtain. Therefore, both the selection and the search for validation 
features can be accomplished by recursively applying FIND to the remain¬ 
ing features of the desired object, in the context established by those 
already validated. Features will again be selected in an order determined 
by the product of their estimated FOMs and I values. Here, I values 
reflect the extent to which each validation goal raises coniidence in 
the object. 

The limited context simplifies these subordinate finds 
in two important ways. First, the area that must be searched is re¬ 
stricted to the immediate vicinity of the previously acquired parts. 
Second, the selected feature need only distinguish the desired object 
from remaining members of the ambiguity set. 

Validation of a tree branch terminates when either the 
alloted budget is exhausted, the required confidence in the goal is 
achieved, or the confidence drops below a failure level. If validation 
succeeds, control passes to the next most promising (and adequately 
funded) subgoal. If it fails, control returns to the scanner, after the 
GM has updated affected FOM values. 

d. Example 

We would like to conclude this discussion with an example 
illustrating a possible flow of execution. The goal will be: Find a 
telephone. Let us assume that during planning the tree shown in Figure 
11 was generated. In this tree, the FOM values—chosen arbitrarily for 
purposes of illustration—are written next to the nodes, and the IFOM 
values—also illustrative, are written beside the branches. Note that 
in several cases the FOM value shown for a node is greater than what 
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would be obtained from backing up the maximum descendant FOM. Such an 
increment would have been added during plan refinement to reward nodes 
with several good alternative subgoals (e.g., the desk can be found by 
looking for its top or side) or those that are subgoals to several other 
nodes (e.g., the FOM of the floor is increased due to the fact that it 
appears twice). During the development of this example, the quantitative 
adjustments of FOMs will be somewhat arbitrary; what we consider important 
here is the qualitative effect of these adjustments. The discussion will 
not include budget allocation; assume that sufficient budget has been 
allotted to perform the task. 

Now, proceeding down our tree, we see that the best way 
to locate the telephone is to first find a desk. In looking for a desk, 
the top is the best part for acquisition, and the color, dark gray, is 
the most promising feature. Therefore, we start the scanner (with any 
available starting information) in an attempt to find a dark gray sample. 
Let us assume that it fails. The GM now changes the FOMs of nodes re¬ 
quiring a dark gray region to zero and prunes those branches. After the 
adjustment of all affected FOMs, we have the tree shown in Figure li.. 

The desk is still the most promising way to find a tele¬ 
phone, but this time the floor should be used as an aid to finding the 
desk. The scanner thus attempts to locate samples belonging to a hori¬ 
zontal plane of height 0. Assume it succeeds both in finding such a 
sample, and in growing a surrounding region based on these surface attri¬ 
butes and the color (buff). The GM now readjusts the FOMs of affected 
subgoals to reflect success in finding the floor. 

In the new tree, shown in Figure 13, the desk remains the 
best way to proceed to our goal of a telephone. We should point out that 
this is because the side of a desk is more distinctive than the leg of a 
table. The scanner now attempts to locate a horizontal plane 30 inches 
above the floor, in search of the desk top. Assume that a sample of such 
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a plane is found, but with the wrong color (even though color was removed 
as an acquisition feature, it is still a requirement for validation). 

This fact is noted in the sample table, and the scanner continues. It 
ultimately terminates, never having found a horizontal plane of the ap¬ 
propriate color. However, the several samples of a horizontal plane at 
a height of 30 inches that were found will increase the FOM of the 
HPLANE(30) feature of the top of the table. The resulting planning tree 
is shown in Figure 14. The top of the desk has been abandoned as an 
acquisition feature, but the side still exists as a subgoal. If the 
side is ever located, then the characteristics of the top can still be 
used in validation (with more tolerant acceptance thresholds). 

The top of the table is now the strongest subgoal. The 
sample table provides the location of the samples of horizontal plane 
that were observed, and a check indicates that the color matches that 
required for the top of the table--light gray. The complete region is 
grown, and its size and shape are successfully compared with the expected 
dimensions of the top. The validation of the table is completed by 
finding its legs. Successful validation of the table causes another 
flurry of FOM reevaluation, resulting in the new tree of Figure 15. 

It is now cost-effective to look directly for the telephone 
within the boundaries of the table top, using the "black side" as the best 
feature. This color is located, and it is verified that the sample be¬ 
longs to a vertical surface. The region is then grown and found to have 
appropriate shape. Finally the dial is sought to complete validation 
and pin down the location of the telephone. (Of course, by now our 
caller has hung upI) 

This example shows how the output from the planning stage 
was processed during execution. The GM performed substantially the same 
evaluations tnat were required during planning. While there was no actual 
reincarnation of the planner at a suspended node, it would probably have 
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been necessary to resort to this if the table had not been found. At 
each stage, the executive selected the best goal and the best acquisition 
features to pursue. We expect to implement a system that functions es¬ 
sentially in this way, in the coming year. 

®• Goal Directed Scene Segmentation 

One of the most interesting aspects of the strategy outlined above 
is the degree to which high level recognition and low level feature ex¬ 
traction have been integrated; one knowledge base and one basic strategy 
concept are used at all levels of the perceptual process, from the selec¬ 
tion of attributes for region growth to the selection of distinguishing 
features for object recognition. 

In this section, we want to elaborate on our goal directed approach 
to scene segmentation (i.e., region extraction) based on multisensory 
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data. The success of any perceptual system is largely dependent on the 
accuracy and appropriateness of the v sual features it extracts. We 
believe we have found a straightforward way of utilizing high level 
knowledge to improve substantially the reliability with which regions 
can be extracted. 

1. Background 

"Segmentation" is the process of partitioning a scene into in¬ 
teresting figures distinguished from the background. This process is 
essential to virtually every perceptual function. Robot projects have 
hitherto largely avoided the problems of segmentation by choosing texture¬ 
less environments in which both objects and background could be adequately 
characterized by regions of homogeneous brightness. A typical algorithm 
for segmenting such an environment was: 

• Group contiguous raster points into regions of 
identical brightness. 

• Merge adjacent regions if the contrast over a 
suitable fraction of their common boundary is 
sufficiently low. 

• Merge recursively until no more combinations are 
possible. 

Such an algorithm cannot cope with intensity variations due to 
color, texture, and range that characterize most real-world scenes. A 
superficial solution is simply to incorporate additional primitives (e.g., 
texture operators) and/or sensory data (e.g., color and range) into the 
initial similarity judgements. However, this solution exposes serious 
limitatiois that are deeply rooted in the bottom-up nature of the strategy. 

In a bottom- , up approach, one would lump the new primitives and 
sensory data together into a weighted attribute vector and judge simi¬ 
larity by distance in a multidimensional space. However, in any particu¬ 
lar context, some of these attributes will be irrelevant, erasing 
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unnecessary fragmentation of surfaces that are actually homogeneous with 
respect to an appropriate attribute set. 

This fragmentation, in addition to the fragmentation caused by 
ambiguities inherent in local similarity judgements, must again be 
resolved by merging similar regions at a subsequent level of processing. 
However, in naturally textured scenes, it is necessary to consider com¬ 
bining proximate as well as contiguous regions, according to criter? \ 
that may include size, shape, spatial orientation, spatial relationships 
with other regions, and so on, in addition to the original attribute 
vector. A lumped computation here will obviously include inappropriate 
factors. An alternative is to try various combinations of attributes 
and see which provides the "best organization" according to some Gestalt 
type criteria. Unfortunately, this alternative is infeasible combina- 
torially with a serial computer. 

2. Classification Approach 

Scene segmentation at all levels must be responsive to higher 
level intent and expectations. In our goal directed vision system, 
regions are grown hierarchically by selecting more primitive regions 
satisfying specific criteria associated with the object of interest. 

The lowest level regions are grown by selecting proximate image samples 
having local attributes (i.e., color, surface orientation) characteristic 
of a particular object. 

Growth by selection brings high level knowledge to bear on the 
problems of low level feature extraction in an intuitively nice way. 
Regions can be grown on the basis of attributes that both unify the de¬ 
sired object and distinguish it from surrounding surfaces. Our basic 
strategy for choosing simple distinguishing features can be directly 
applied to choosing selection criteria for scene segmentation. In fact, 
the same criterion originally used for acquisition will often be 
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appropriate as a selection criterion for extracting the surrounding sur¬ 
face. This criterion may be improved, however, by using knowledge of 
surfaces likely to border physically on the desired surface (e.g., door 
is adjacent to wall and floor, door knob is surrounded by door). 

When specific surface adjacencies are known, the selection 
decision can be based on a direct discrimination of their distinguishing 
attributes. A sample is accepted if its relative feature-space distance 
is closer to the characteristics of the desired object than to those of 
the known alternatives. Such discrimination tests tend to be much more 
tolerant and reliable than one-sided acceptance thresholds. 

The optimal discrimination criteria will usually be directionally 
dependent. For example, when extending a door region downward, it may 
be best to ascertain whether the local orientation of the next sample is 
more nearly vertical or horizontal (the latter indicating a floor). 

However, sideways expansion might have to be based on a brown versus 
buff color discrimination, since the wall is also vertical, and probably 
at an unknown angle to the door. 

Surface adjacencies can be determined empirically when a 
region, in the process of growth, intersects a previously identified 
surface. The common border, which was originally obtained by criteria 
selected independently for each surface, can then be refined by reclas¬ 
sifying samples in the vicinity according to a discrimination criteria 
based on both surfaces. 

Many other characteristics of the desired object can be brought 
to bear as additional constraints governing sample selection. For example, 
the area from which samples should be selected is bounded by the expected 
region size, while texture scale specifies the required sample density. 

The acceptance thresholds for selection can be set from knowledge of the 
expected uniformity of an object's surface attributes and of expected 
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distortions introduced by the associated detection processes. Since the 
resulting thresholds are absolute, selection is not subject to the ambi¬ 
guity of local similarity judgements. 

3. Coping with Texture 

Textured surfaces provide the clearest illustration of the 
advantages of a classification approach to scene organization. It is 
simply not feasible to extract arbitrarily textured surfaces, bottom-up, 
on a serial computer because of the impossibly large number of organiza¬ 
tional hypotheses that would have to be tested to discover the criterial 
pattern. Consider, for example, the effort that might be required to 
determine, bottom-up, that the floor region should not be segmented ac¬ 
cording to the shape of linoleum streaks. 

The combinatorial problems associated with serially discovering 
two-dimensional patterns can be overcome by looking top-down for the 
known criterial attributes of a desired surface. Raster points can be 
selected in groups, satisfying explicit statistical or descriptive cri¬ 
teria, such as a required distribution of colors within a specified 
radius, or even a detailed prototype micropattern with specified spatial 
relationships among the colors. 

Many textures admit to a hierarchical organization, consisting 
of primitive regions that are grouped into subpatterns, which in turn 
are grouped into patterns, and so on. For these surfaces, the selection 
process can be applied recursively. Thus, a checkered texture might be 
organized by first selecting contiguous samples of specified colors to 
get the primitive regions (i.e., the squares) and then selecting the set 
of resulting regions that are contiguous, are of appropriate size and 
shape, and alternate in color. 
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This approach is applicable to organizing several objects in 
our scenario. For example, the floor can be grown by collecting sets of 
contiguous brown or black samples (corresponding to linoleum streaks) 
and then collecting the set of such regions with appropriate size (shape 
arbitrary) that are surrounded by buff. Similarly, a picture could be 
extracted within the context of a vertical wall plane by selecting a 
contiguous set of nonbuff regions and small buff regions completely 
surrounded by non-buff ones. 

Conceptually, the selection process can be recursively applied 
through many levels of hierarchy to extract arbitrarily complex patterns. 
However, in limited contexts, a single level organization based on simple 
distinguishing features of the desired texture may prove more effective. 
For example, a highly textured object on a plain surface might be best 
extracted as a region of many small regions; the detailed structure of the 
texture is irrelevant here, and can be ignored. In another environment, 
the same object might be most easily extracted by selecting samples on 
the basis of their proximity to a distinguishing color that happened to 
be a component of the texture. 

The above representations are admittedly ad hoc and of limited 
generality, but they provide optimum knowledge-based partitioning in 
those environments. Region growth, like recognition, can often be ac¬ 
complished more cheaply and reliably using simple discriminations between 
context limited alternatives than by exhaustive description. 

4. Operational Details 

Most surfaces can be extracted on the basis of several inde¬ 
pendent criteria, such as color, texture, or orientation. In practice, 
we shall choose the simplest logically sufficient set of attributes to 
grow the initial region, saving the remaining alternatives for error re¬ 
covery. The selection criteria for each attribute can be combined into 
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a weighted vector or logic function which is used to test each sample 
point. A logic function seems preferable because absolute constraints 
like must' and must not" can be applied to specific attributes and be¬ 
cause the problem of scaling unrelated criteria is bypassed. 

Regions will initially be grown with coarse resolution, relying 
on crude measures of size and shape to eliminate cheaply many bad hypothe¬ 
ses. Detailed boundaries can then be obtained, if necessary, by using 
the attributes of the crude surfaces as classification criteria. Addi¬ 
tional image samples obtained at finer resolution are simply assigned 
to whatever neighboring surface they most closely resemble. 

5. Error Recovery 

Although our classification approach can be expected to extract 
regions more reliably than post bottom-up techniques, we forsee the 
continuing necessity of an iterative growth process, guided by feedback 
from unsuccessful validation tests. 

a. Background 

Past attempts to graft such feedback onto bottom-up systems 
have largely failed because knowledge of an error was often available 
only after a global recognition decision failed. This was too far re¬ 
moved from the source of error to pin-point the cause explicitly. It 
was also difficult to model conventional region growing algorithms so as 
to establish the relations between detector idiosyncracies, known object 
attributes, region-grower parameters, and the resulting regions. Conse¬ 
quently, it was difficult both to diagnose the cause of suspected errors 
and to alter parameterj given an assumed cause. 

By contrast, our goal directed perceptual strategy attempts 
to validate the expected attributes of a region (e.g., size, shape. 


uniformity) as soon as it is grown. For example, evaluation will be 
performed on regions extracted at each level of a hierarchically organized 
texture. Suspected errors can be directly related to specific selection 
criteria, based on world knowledge and pragmatic experience with the 
various sensory processes. Feedback from this diagnosis is then treated 
as simply another goal directed factor influencing the refined selection 

criteria. 


b. Diagnostics 

The type of knowledge-based diagnostic process wo en¬ 
visage is illustrated by the following list of plausible explanations 
and corresponding recovery strategics for a region that is smaller than 

expected: 


(1) Occlusion—This hypothesis is easily 
established given range information; 
occlusion is likely when the visual 
boundary of a selected region is bordered 
by samples of nearer range. Additional 
samples that satisfy the classification 
criteria and arc within the expected size 
boundary, but arc not contiguous to the 
originally selected s't (due to occlusion), 
can then be included in a refined size 
estimate with reasonable confidence. 

(2) Holes— Explanations f^r holes in a supposedly 
homogeneous region, not explained by occlusion, 
can be sought by checking whether any attributes 
of the expected region excite known limitations 
in the sensory domains used for selection. For 
example, color data can be anticipated to be 
unicliable for pastel (weakly saturated) objects, 
and at highlights expected on glossy objects. 
Range data will also be unreliable at highlights 
as well as on dark or very oblique surfaces 
(because of inadequate reflected light). Such 
explanations arc easily checked and can often 

be rectified by looking for continuity of an 
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alternative attribute not involving the 
affected sensor. 

(3) Gradients — Illumination and texture gradients 
frequently distort homogeneous surfaces due to 
artifacts inherent in the imaging process. As 
a result, static acceptance thresholds often 
fail. Suspected gradients can be extrapolated 
to correct the selection threshold as a function 
of position. One can also obtain the surface 
orientation with range data and use it with an 
appropriate imaging or illumination model to 
compensate for the gradient analytically. 

Brice and Fennemaf 5 suspected intensity gradients 
whenever the boundary of a region was weak 
(i.e., low contrast) and wiggly, a so-called 
quantization contour. Contiguous regions with 
such boundaries were merged since, individually, 
they did not correspond to known objects. 
Although this heuristic was originally intended 
for use in a bottom-up algorithm, the charac¬ 
teristics "weak" and "wiggly" can be used by 
our system like any other region grouping 
criteria. 

A more general heuristic is based on the fact 
that quantization contours are unlikely to 
affect two independent sensory modalities at 
the same points on an object. Thus, continuity 
in another attribute (e.g., surface orientation) 
can Justify ignoring weak boundaries among the 
original (e.g., color) regions. 

(4) Wrong Object—The most obvious cause of error 

is simply that an undesired surface was acquired. 
The global attributes of the extracted region 
can be checked to determine if they correspond 
to any surface of an actual toown object. If 
so, the surface should be regrown based on local 
surface attributes that distinguish this other 
object from the desired one. Where no such 
attributes exist, discrimination must be accom¬ 
plished in terms of other parts of the objects, 
by a higher level validation strategy. 
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These error hypotheses are only representative. Similar 
error recovery lists have been compiled for regions that are too large 
or of the wrong shape. 

c. Diagnostic Strategy 

Diagnosis will be performed by systematically investigating 
possible errors. The order of examination will be empirically based on 
the frequency with which each error is observed in the current environ¬ 
mental context. We have yet to decide whether statistics will be kept 
only for a general class of errors (e.g., wrong object) or for very common 
specific errors (e.g., floor often merged with wall due to similar color). 

It may seldom pay actually to pursue such detailed diag¬ 
noses. Diagnosis is most important in cases where high confidence re¬ 
quirements demand a positive explanation for errors or where exact surface 
boundaries are needed (e.g., for manipulation). Otherwise, it should be 
more effective simply to exercise available strategy options without re¬ 
gard to the cause of original failure. For example, the system could 
attempt to regrow the region using other attributes and/or sensory mo¬ 
dalities. Alternatively, the initial failure may make it more desirable 
to attempt validation in terms of other surfaces. In either case, sub¬ 
sequent successes or failures should quickly establish a definitive con¬ 
fidence level. Tests for the most common error conditions (e.g., for 
occlusion or inconsistency of regions extracted with different attributes) 
could be built into the initial growing process to increase its basic 
reliability. 

E. Multisensory Data 

An important goal of our perception research is to make effective 
use of multisensory data. The initial system is designed to work with 
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coordinated arrays of range values and intensity values seen through a 
number of color filters. In this section we summarize our progress in 
interpreting range and color data. We also describe a technique for 
interactively generating coordinated arrays of color and range data which 
we will use to obtain experimental data until our new rangefinder, to be 
described subsequently, is available. 

1. Analysis of hange Data 

Range data should ultimately allow simple solutions to some of 
the most confounding problems now faced by passive vision systems. We 
are currently experimenting with algorithms to obtain the position and 
orientation of a surface element for use in acquisition and goal directed 
region extraction as described above. As a step vard extracting addi¬ 
tional useful information, we have analyzed some general problems con¬ 
cerning the low level interpretation of range data as planar surfaces. 

We expect this analysis to lead to special purpose routines for rapidly 
extracting certain common types of surfaces (e.g., a horizontal surface 
of known height). Moreover, the analysis may provide improved algorithms 
for bottom-up scone segmentation. A reliable, a priori knowledge of a 
principle planar surface in a scene (e.g., the floor) could help the 
planning algorithms immensely in tasks requiring some description, es¬ 
pecially when an object has to be found in an unknown environment. 

The analysis is divided into five topics: (1) alternative 
methods for scanning and measuring quantized range data; (2) extraction 
of image points belonging to horizontal and vertical surfaces; (3) clas¬ 
sification of planar-region image boundaries and methods for finding 
points along those boundaries; (4) adverse effects of errors in the range 
data on the resulting image boundary points; and (5) line-fitting methods 
for determining the vertical, horizontal, and three-dimensional boundaries 
in the scene. The analysis, whose details are described elsewhere, 7 is 
summarized as follows. 
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a. Range Data 

Referring to Figures 16(a) and 16(b), two types of image 

coordinates, centered at the range finder, are distinguished: spherical , 

r.fl and rn . whose angular increments ar and Act are constant, and 
p P 4 P P P 

planar, r,x and z , whose increments aB and a^ are tangent adjusted 

- p p' JkJ p UH p 

so that the increments az and, for given z , &x are constant. The range 

P P P 

data are confined to a solid angle defined by (Ir I £ P ; !cp I £ m ). 

1 p pm r p ^pm 

There are two types of optical range finder, each consist¬ 
ing of a transmitter and a receiver <»f a light beam, to consider: one is 
based on the trigonometry of transmitter-receiver-object triangulation; 
the other is based on splitting a transmitted amplitude modulated light 
beam into two beams and measuring the phase shift between one of them 
end the scattering of the other from the object. (Both types of range 
finders are discussed more fully in Section V.) For a triangulation 
range finder, missing data (called "shadow gaps") occur for some image 
points (I,J) if the corresponding object points are seen by the receiver, 
but are not illuminated by the transmitter (I and J are the indices of 
the array column and row, respectively). 

Six types of errors in the measured range data, r (I,J), 
are recognized: inherent inaccuracy of the range finder, shadow gaps, 
datu dropout, quantization error, noise, and gross sporadic error. 

b. Image Points of Horizontal and Vertical Surfaces 

A general plane equation, using range-centered image 
coordinates, has been derived and then simplified for horizontal surfaces 
and the floor, and for vertical surfaces. Consequently, a method is 
proposed for fast extraction of those image points belonging to either 
the floor or a horizontal surface of height z^ t ^z^. The method is 
based on the observation that, for these surfaces, the function r(g ) 
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FIGURE 16 RANGE-FINDER CENTERED IMAGE COORDINATES 
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for a given cp is symmetrical about its minimum value, r . at ft = 0 
P min P p 

and is so flat in the range -g <, g <, g that it may be approximated 

pm p K pm 

by a constant. A floor point is detected if r (8 ) £ a r where 

p min 

O' < 1 is a safety factor. This algorithm was applied to ideal range 
data (see Section III-E-3) of a scene consisting of a floor, a box, a 
polyhedral object, and a wall. Using = 0.98, the results are shown 
in Figure 17. Computed floor points that agree with the true ones are 
marked by "F," and those computed points that differ from the true ones 
are marked by D, Cells marked by B indicate surface boundary points, 
to be discussed in the following section. 

Referring to Figure 18, a point on a horizontal surface 

below the horizon is detected if r*(g^) is between r = r min , corresponding 

*° z u + A Z H anc * B_ = r = r corresponding to z - a z and a = 

n P H H P p 

Bpm* Additional tests ore employed in order to eliminate wrong points 
that pass this test. A method is also proposed for detecting horizontal 
surfaces of unknown heights, finding their heights, and extracting their 
image points. The method is based on computing z for each data point, 
quantizing the z values into a one-dimensional array of cells, and de¬ 
tecting cells with high counts. To prevent missing a high count due to 
quantization, this process is performed twice, using two arrays that are 
offset by half a cell relative to each other. 

The above method can be extended to detection of vertical 
planes. Here, the x and y coordinates of all the data points are first 
computed. High density clusters of (x,y) points are then fitted into 
straight segments, using a window-count method or a modified Hough 
transformation method. 

c » Image Points of Different Boundary Types 

Five types of region boundaries are distinguished: frame 
boundary (of the range data ’ picture"), jump boundary (where one object 
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FIGURE 17 COMPUTED FLOOR BOUNDARY 

"F" points agree with the true ones; "D" points differ from 
the true ones. Points marked by "B" indicate surface 
boundaries. 
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occludes another), shadow boundary (along a region where range data are 


missing), known region boundary (enclosing a region, such as the floor, 
a horizontal surface, or a vertical surface, whose image points are given), 
and intersection boundary (between the images of two nonparallel adjacent 
planar regions). The first four types of boundary are collectively 
called surface boundaries because they are sufficient to outline the con¬ 
tour of an object surface. Surface and intersection boundaries are il¬ 
lustrated in Figure 19, assuming that only the image points belonging to 
the floor and the first step are known a priori. 



FIGURE 19 SURFACE AND INTERSECTION BOUNDARIES 
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II 


The points that form the surface boundaries are easily 
determined: Frame and shadow boundaries are given; a known-region 
boundary is obtained as the perimeter of a given set of image points; 
a jump boundary point is established during the horizontal or the verti¬ 
cal scan if a range value differs from the preceding one by more than a 
threshold value and if the following range discontinuity is less than 
this threshold (the latter condition prevents confusing a true jump 
boundary with range increments on a very oblique surface). 

Intersection boundary points are established by first 

assuming that adjacent pairs of surface boundary points on each horizontal 

raster line (of constant cp or z ) bound the same planar surface. The 

P P 

absolute value of the error between the measured range, r*(n ), and that 

P 

predicted by these ideal surfaces, r(g p ), is computed for each point in 

these intervals. The maximum error, ^r, equals |r(g ) - r*(g )l. If 

P P 

A r between any pair of boundary points exceeds a threshold, then an 

intersection boundary is established at that point, and the process is 

repeated recursively for the left and right surfaces thus established. 

Figure 20 illustrates this fitting process for two surfaces crossing at 

an intersection boundary. (The x—v plane corresponds to the plane swept 

out by the range finder scanning a line of constant m .) Notice in 

P 

Figure 19(b) that we fit trigonometric functions (instead of straight 
lines) to the range data, because a "range image" of a straight line is 
not itself a straight line. The above, called the "end-point fitting 
method," is applicable to convex and concave polyhedral surfaces, as is 
shown in Figure 21, wheic N^, the number of intersection boundary points, 
is 12. The method can also be applied to obtain a piecewise planar 
approximation of a curved surface. The cost of computation entailed in 
using the end-point fitting method is (in the worst case) proportional 
to N^, the highest level of recursion. It can be shown that 
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FIGURE 20 END-POINT r((3 p ) FITTING METHOD 
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FIGURE 21 MULTIPLE r(0 p ) PLANAR-REGION SEGMENTS OF A CONCAVE AND CONVEX SURFACE 


1 + [jn (1 + N^)//n2] 1 i, < 1 + , 

where [ 1 denotes round up to the nearest integer. Fortunately, is 
likely to be equal to its lower bound, especially if N is large. 

2. Effects of Errors in Range Data 

We now examine the effects of errors in the range data on the 

resulting region and boundary points, and propose some context-dependent 

ad hoc rules that may minimize these adverse effects. First, a slightly 

expanded floor region is extracted by using a < (This minimizes the 

error incurred in subsequently fitting non-floor planes by ensuring the 

exclusion of floor points.) Small adjacent regions or "islands" of no 

data may be considered part of a large region if there is reason to assume 

that they resulted from erroneous data. Jump boundaries resulting from 

high r*(g ) "spikes" of decreased range may be disregarded if no thin 
p * 

objects are expected and if r is relatively short. Horizontal and 
vertical scans, in conjunction with different data errors, may result in 
wide intersection boundaries, but fortunately have no effect on widening 
the surface boundaries. Two cases are distinguished relative to r*(g p ) 
spikes: if only smooth and relatively large region surfaces are expected, 
then the spikes are disregarded; otherwise, the spikes are associated with 
the surface texture. 

Errors in range data may also cause gaps in the postsCan boun¬ 
daries, especially the intersection boundaries. These gaps may be bridged 
by the line fitting methods described next. 

3. Line Fitting of Region Boundary Points 

Methods are proposed for determining gap-free surfaces and 
intersection boundaries by line fitting the boundary points found pre¬ 
viously. We begin with vertical boundaries, because these are common 
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and easiest to compute. We divide the x-y plane (within the limits of 
interest) into L x M cells, compute the x and y coordinates of every 
boundary point, detect high-count colls using triple entry quantization, 
identify the cells with the top views of vertical boundaries, and compute 
the extreme z values of each vertical boundary from its extreme I values. 
Two types of quantization cells are considered: a fixed size rectangular 
cell and a variable size curvilinear cell. The latter matches the in¬ 
herent error of a range finder of given angular tolerance. After deleting 
the vertical boundary points, we determine the horizontal boundaries. 

The computation is similar to that of vertical boundaries, except that 
we begin by detecting clusters of similar z values, compute the x and y 
coordinates of the high count clusters, and fit straight segments to the 
resulting (x,y) points. Finally, we compute the parameters of three- 
dimensional straight boundaries by fitting straight lines to the projec¬ 
tions of the remaining boundary points onto the x-y and y-z planes, and 
compute their end points. 

'1. Analysis of Color Data 

A color recognition module has been implemented, which takes 
as input the absolute intensities observed through three or more band 
pass filters (normally the standard red, green, and blue separations). 

Its output names a hue and provides a numerical estimate of the purity 
of that hue. This program has three distinguishing features: 

• Interactively trainable--New hues are learned by 
example, when they are encountered for the first 
time. The initial characterization can be refined 
with additional examples if the color is later 
misclasslfled. This approach overcomes several 
disadvantages associated with using a fixpd charac¬ 
terization of hues, obtained, for example, from a 
standard observer table. Achromatic hues are 
recognized by low saturation and identified as 
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white, gray, or black depending on their relative 
brightness compared with averages taken over the 
scene. 

• Additional filters possible--Hues can be charac¬ 
terized in terms of an arbitrary number of filters. 
More filters may allow discrimination between hues 
that cannot be distinguished with three color 
separations. Narrow band color (e.g., recognizing 
objects by the presence of certain spectral lines) 
is an extreme example of how additional filters 
could be used to capitalize on knowledge of the 
environment. 

• Color constancy--Illumination observed from objects 
is a product of incident illumination and object 
reflectivity. To obtain the intrinsic hue of an 
object it is necessary to normalize by the observed 
illumination to eliminate the incident component. 

Our system is designed to achieve first order color 
constancy by normalizing out the illumination 
source. (Second order effects on incident illumi¬ 
nation, such as reflection from neighboring objects, 
are not handled.) Source normalization allows 
colors learned under one illumination to be recog¬ 
nized under any other illumination, once that 
source has been calibrated using an object whose 
color is known. 


5. Generation of Test Data 

The perceptual system uses arrays of both color and range data. 
Because we did not want to wait until the scanning rangefinder (described 
in Section V) was built, interfaced, and calibrated, we decided to create 
simulated range data. The interactive program that produces an array of 
range data from a picture array works as follows. A picture is taken 
and its gradient is shown on the CRT display. An operator controls a 
TV cursor to outline polygonal boundaries of the planar regions in the 
scene. He also identifies each region, thereby providing the program 
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With the parameters that define each plane in three-dimensional space * 
Thus, for every point in the picture, the corresponding plane in three- 
dimensional space can be found by table lookup. 

The procedure described by Wichman 0 and Falk 9 is used to de¬ 
termine a ray in space for each picture point. Their procedure uses 
four reference points on the floor to determine the collineation trans¬ 
formation between picture points and floor points, and two reference 
points off the floor to determine the lens center. Thus, for any picture 
point the collineation transformation yields the corresponding floor 
point, and this together with the lens center defines the ray in space. 
The range is obtained merely by computing the intersection of this ray 
with the corresponding plane. Repetition of this process for each point 

in the picture produces a range array in exact register with the picture 
array. 

F. Research Methodology 

In the preceding sections we have described our design for a system 
for interpreting perceptual data. Our belief is that such a system can 
be successfully implemented only if we explicitly accommodate ourselves 
to the empirical nature of machine perception research. Accordingly, 
the design includes certain key features to facilitate experimentation 
with real scenes and to enable us to incorporate the experimental results 
into an evolving system implementation. We first single out these fea¬ 
tures and then present our overall research plan. 


These parameters are currently measured manually, 
could also be obtained interactively by having the 
corresponding points in two stereo views. 


If desired, they 
operator designate 
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1 . 


System Features 


Two key features that should make our system especially valuable 
as a research tool are that it is data driven and highly interactive. 

a. Data Driven Organization 

Almost all of the system's perceptual knowledge will re¬ 
side in its knowledge base, in easily accessible declarative form. This 
knowledge includes object attributes and their relations, as well as the 
cost and reliability of perceptual operators for extracting them. The 
system itself consists essentially of just two relatively small programs. 
One selects a cost-effective sequence of attributes and the other in— 
terpretively executes the resulting strategy. Very little perceptual 
understanding is required for either program. Thus, once they are de¬ 
bugged, the system can be incrementally refined and expanded primarily 
by modifying the symbolic knowledge base (and occasionally by programming 
a new perceptual operator). 

b. Facilities for Interaction 

The experimenter will be able to test manually generated 
strategies either by interactively calling system modules or by speci¬ 
fying the complete strategy as a program for interpretive execution. 

New visual concepts will be designated either by entering symbolic des¬ 
criptions or, in the absence of adequate semantics, by outlining them 
(with a cursor) on a grey scale display. 

The combined capabilities for graphical as well as sym¬ 
bolic communication should prove to be a powerful experimental tool for 
perception research. For example, specified perceptual operators can be 
applied interactively to graphically designated surfaces to determine 
how effectively they discriminate that surface from adjacent ones or 


90 



from those previously characterized in visual memory. The resulting 
information can then be used to select, respectively, good region growth 
and acquisition criteria. The outlined surface can also serve as an 
exemplar for learning new colors, textures, shapes, and the like, "by 
example." Symbolic values for these attributes (e.g., red, grainy, 
elongated) can be represented internally in terms of perceptual operators 
that discriminate the designated surface from exemplars of previously 
defined values. Finally, structural models can be constructed inter¬ 
actively by specifying spatial constraints between outlined parts of an 
object. The constraints can be provided symbolically (e.g., part A must 
be above part B) or photogrammetrically, by specifying measurement 
routines to extract metric relationships designated graphically in the 
picture. 


2. Plan s 

We plan to implement the system in, roughly, four phases. 

Phase 1 : A skeleton system is now being assembled, consisting 
of some primitive perceptual operators, an interactive interpreter for 
applying those operators, and a facility for graphical communication. 

This facility will initially be used to validate experimentally our basic 
premise that there exist easy ways of distinguishing objects in con¬ 
strained contexts. The investigator will attempt interactively to gener¬ 
ate distinguishing features strategies (like those in Section III-B). 
for finding the various scenario objects in a few test scenes. Criteria 
for acquisition, growth, region evaluation, and validation will be se¬ 
lected pragmatically, by choosing the most distinctive attributes of an 
object that are easily represented in terms of available perceptual 
operators. Immediate feedback from the system will indicate the validity 
of these intuitions. 
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Strategies will occasionally fail because the investigator 
had not anticipated the literal implications of applying a single crite¬ 
rion to all objects. The investigator may then specify more constraining 
criteria. However, we expect that failures will more commonly arise due 
to inadequate primitives that cannot reliably perceive intended discrim¬ 
inations. In such cases, one mignt again choose other criteria, but if 
no suitable alternative exists it becomes necessary to refine the pro¬ 
grammed operators. We expect a set of adequate operators to emerge as 
a major result of this phase of research. 

Successful perceptual strategies constitute procedural descrip¬ 
tions of objects, and will be retained to establish the system's initial 
world model. The corresponding initial planning algorithm for finding 
an object is then simply to retrieve the appropriate strategy. 

Phase 2: In this phase our emphasis will shift to machine 
generated strategies for finding a limited set of objects, efficiently as 
well as reliably, in a wide range of scenes. To this end we will augment 
the knowledge base with descriptions of objects in declarative form and 
will develop basic modules for planning and execution monitoring, like 
those outlined in Section III-C. 

Progress in Phase 2 depends in good part on obtaining quantita¬ 
tive expressions for utility and recognition confidence. We seek, for 
each object feature, numerical estimates of utility that correlate rea¬ 
sonably well with the effectiveness of corresponding strategies. We 
also require pragmatic expressions for recognition confidence, so that 
we can terminate our sequential decision process on the basis of confi¬ 
dences observed so far. 

We will not be concerned in this phase with planning efficiency, 
but will instead search exhaustively for the best plans. Objects will 
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be described in the knowledge base directly in terms of their distinguish¬ 
ing features. These features, in turn, will be described in terms of 
specific feature extraction operators. This allows us to defer to a 
subsequent phase of research the key problem of automatically discovering 
g°°d distinguishing feature representations for objects. 

We expect to arrive empirically at both adequate object descrip¬ 
tions and utility estimates by evolving them from the initial intuitive 
estimates developed during Phase 1. A human experimenter will analyze 
failures of machine generated strategies and interactively refine utility 
estimates. He will also be able to enhance discrimination by suggesting 
additional object attributes and/or contextual constraints. Interactive 
refinement of crude intuitive estimates should prove an effective com¬ 
promise between the unworkable extremes of unsupervised learning and 
exhaustively analyzed initial models. 

When strategies can be generated for reliably finding the 
original objects, the scenario will be incrementally enlarge l to encom- 
par. additional office items. Descriptions will again be interactively 
generated. The experimenter will suggest initial features that distin¬ 
guish the new object from others already in visual memory. Those features 
may then be refined with feedback on their empirical effectiveness. 

Phase 3 : Phase 3 will concentrate on developing heuristics 
for efficient planning. The effectiveness of strategies generated under 
various simplifying assumptions will be compared with the effectiveness 
of those generated by exhaustive search in Phase 2. Two representative 
heuristics that will be studied are the use of a fixed budget percentage 
as a termination criterion,, and the elimination of combinatorics intro¬ 
duced by the consideration of shared modules. 
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Phase 4: In this final phase we will attempt to automate much 
of the decision making and problem solving handled interactively in 
Phases 1 and 2. Specifically, the system itself should deduce from its 
general fund of world knowledge the features and corresponding represen¬ 
tations that best distinguish a given object in a particular context. 

The data base might now include knowledge about the functions and material 
compositions of objects, models of perceptual operators and their error 
processes, and the level of detail needed to fulfill various task ob¬ 
jectives. 

Phase 4 is a formidable long range goal that may well require 
breakthroughs in automatic programming. However, we feel that perceptual 
strategies constitute a particularly well structured domain in which to 
explore the key issues. In fact, we suspect it will be again posuiblo 
to proceed incrementally, by adding knowledge and inference rules, as 
necessary, to the basic Phase 2 system. 
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IV SYSTEM SOFTWARE 


A. Introduction 

Several features were needed at the close of the previous year in 
order to enhance the utility of the robot as a general purpose experi¬ 
mental vehicle. This section discusses developments on this front from 
a software viewpoint; as will be explained presently, we have made con¬ 
siderable progress. For the purpose of discussion, we can view the pro¬ 
cess of increasing the experimental utility as proceeding along two paths: 
making the robot easier to use, and telling people how to use it. Here 
we describe the former task; a recently completed primer 10 describes the 
current robot system from a user's viewpoint. 

There were three parts to the task of making the robot more usable. 
First, we made the robot usable to an experimenter who has a minimum of 
specific knowledge of its inner workings and second, we greatly increased 
the speed of the robot so that the experimenter would not be thwarted by 
lack of extraordinary patience. Finally, we made the overall robot action 
system more reliable. Now, for example, the robot can execute a fairly 
long STRIPS-produced plan with reasonable assurance of success in the 
face of normal day-to-day radio interference and other adversities. 

B. Utility Factors 

1. Comprehensibility 

We have increased the comprehensibility of the robot action 
routines in two ways: we have made them available in a neat, self- 
contained package, and we have included in this package a command inter¬ 
preter. Formerly, we had provided the action routines primarily in 
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symbolic (source code) form for the would-be user (necessarily a pro¬ 
grammer) to include with his own software. Now the primary form in which 
the action routines are available is a LISP SYSOUT file, which makes the 
action package almost as easily available as any other major subsvstem 
within TENEX. We now have a command interpreter (which was easily pro¬ 
vided as a minor extension of an already existing debugging aid). This 
modest amount of software permits a novice experimenter to use the action 
loutines as programs, and to easily obtain informative status reports 
couched in meaningful terms. The combination permits a person to estab¬ 
lish contact with the robot in less than a minute by typing three command 
lines. 

2. Speed 

The action routines in their most primitive state were executed 
so slowly as to require inordinate patience on the part of the user. 
Around the beginning of the project year, we changed from DEC’S time¬ 
sharing system to 13BN ’ s TENEX time-sharing system. This gave us a tre¬ 
mendous speedup due to the difference in swapping techniques; formerly, 
we had to swap a program's entire core image into core before running it, 
whereas we now can swap only a small part of a program into core before 
running it. The gain was especially great because a typical job mix 
contains several programs each large enough nearly to fill user core. At 
the beginning of the project year, then, a single complex 1LA (interme¬ 
diate level action) could still take as long us ‘10 minutes to execute. 

At that stage, the action routines were run entirely in an interpretive 
mode. The speedup techniques we employed were a combination of compiling 
and rewriting the routines and trasnferring the whole package to BBN LISP. 
The net result of our efforts was to reduce execution times of the more 
complicated lLAs by a factor of about five. 
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3. 


Reliability 


The reliability improvements we made are an extension oi the 
ruggedness already built into the software. For example, the ILAs have 
always been capable of coping with accumulated small mechanical inaccura¬ 
cies, or with more serious events such as encountering unexpected ob¬ 
stacles. This year we extended the ruggedness to an ability to cope with 
more drastic failures, such as a telemetry failure. Provision is made 
for error handling in many places throughout the action routines. Each 
routine is rather limited in scope, and so the overall ruggedness is due 
more to the dovetailing of the error provisions than to the power of the 
error handling in any one of them. 

One of the more interesting illustrations of this ruggedness 
was the following incident. During an experiment, the robot lost power 
in mid-maneuver; power was hurriedly restored to the robot, which without 
further intervention visually ascertained its position and proceeded in 
its execution of the I LA in progress at the time. The loss of power 
control caused the LLA's to report a large uncertainty in the robot's 
position. The response of the ILAs was to attempt (multiple times, as is 
their way) to ascertain the robot's position visually. Finally succeeding, 
the normal (for the IIAs) reassessment of the goals and the information 
at hand led the ILAs to continue the maneuvers in progress. The relia¬ 
bility features of the robot software are greatly enhanced (if not made 
possible) by the ilexible error handling features of the BBN LISP system. 

C. Translation to BBN LISP 
1. Overview 

For reasons outlined elsewhere in this section, we converted 
the robot action routines from the Stanford LISP system to the BBN LISP 
Implementation. This translation was tremendously aided by TRANSOR, a 


97 


BBN'-supplied translating program. We added a few programs to the TRANSOR 
system to automate additional required or helpful ancillary processes, 
and we tailored the transformations to the purpose at hand. TRANSOR made 
all straightforward transformations automatically; it noted all doubtful 
constructs and transformed them as well as it could; and it especially 
flagged all constructs requiring hand translation. The principle part 
of the action software requiring hand work was the support package, which 
handles symbolic disk files containing STRIPS operator descriptions and 
the ILA routines (in the iorm of modified Markov tables). This support 
package had been well integrated into SMILE, the Stanford LISP package 
for handling symbolic files. We now have attached it to BBN's "pretty- 
print" package, which serves the same function in the BBN implementation. 
The transfer of the support package required considerable hand work since 
it was outside the useful scope of TRANSOR. However, this hand work af¬ 
forded us the opportunity to rework vhis relatively new software and not 
only weed out the vestiges of the usual false starts, but also to increase 
its flexibility and coherence. 

2. Forking 

To date we have transferred the individual pieces of the robot 
system to BBN LISP. What remains is the reestablishment of the links 
from the action routines to the STRIPS/PLANEX cluster on the one hand 
and to the vision routines on the other. Part of the price of the high 
degree of flexibility offered by the BBN LISP implementation is a reduc¬ 
tion in the memory size available to a user's own program. In our case, 
the available address space was large enough to accommodate either the 
STRIPS/PLANEX cluster or the action package alone, but it was not large 
enough to accommodate them both. A more fundamental restriction of the 
BUN LISP system, however, is its inability to accommodate FORTRAN (or 
other non-LISP) programs; the Stanford LISP system provided a usable, if 
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awkward and inflexible, means of accommodating such "foreign" programs. 
There is, however, a boon in BBN's TENEX time-sharing system: forking. 
That is, any program can have some number (about 25) of subprograms, and 
each of those programs (the mother as well as her daughters) can have 
completely independent address spaces of one-quarter million words each. 
Other projects within SRI's Artificial Intelligence Center have jointly 
developed software to handle the interfacing of forks with their descen¬ 
dants (and ancestors), including creation and other initialization capa¬ 
bilities. This software permits a LISP program to generate multiple in¬ 
dependent subforks and to call FORTRAN subroutines within them. We in¬ 
tend to adapt this software to permit the subfork to be LISP and/or to 
call subroutines in its mother. The former will permit PLANEX actually 
to call the action routines, and the latter will permit the vision rou¬ 
tines to move the robot via the action routines (instead of requiring 
the vision routines to have their own action routines). 

3. Additions for TRANSOR 

Before we leave the subject of the transfer to BBN LISP, we 
will briefly describe our addition to BBN's TRANSOR. As supplied, it 
consisted of four parts: a rule driven translator, a set of rules, a 
package of programs to generate or modify such a 3 et of rules, and a 
prescanner to make certain simple character transformations so the 
translator could read the program to be translated. Of these, we ex¬ 
tended the power of the set of rules (mainly to translate our modified 
Markov tables) and the prescanner (to cope with the way Stanford LISP 
handles overflow of a print line). This still left two of the conven¬ 
tions of Stanford LISP with which TRANSOR could not cope: (1) Stanford 
LISP could (and in practice often did) use octal numbers (instead of 
decimal), and (2) it also could (and often again did in practice) flag 
decimal numbers as such by following them with a decimal point. The BBN 
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LISP system (in which the translator was embedded) would interpret an 
octal number as the decimal number with the same sequence of digits, and 
it would interpret the trailing decimal point as a sign to represent the 
number internally in floating-point format. However, the Stanford LISP 
system is also perfectly capable of producing numbers in a BBN-compatible 
format, so we simply wrote a small "pre-prescan" program in Stanford LISP 
that would read a Stanford LISP file in any format and write it in a 
format suitable for the BBN system. This gave us a system in which a 
file of Stanford LISP programs was first "pre-prescanned," then pre¬ 
scanned and then translated; this produced the translated version as 
well as a file of notes pointing out dubious portions of the output file. 
This last had to be listed so the programmer could check the translation; 
then it and the two intermediate files from the prescanners had to be 
deleted. Typically, the next step was to load the translated file for 
checkout and any necessary hand modification. We wrote a small program 
to perform all these steps and to produce a translated version given 
only the name of the Stanford LISP version of the file. 



V SYSTEM HARDWARE 


A. Introduction 

In this section we briefly describe several pieces of special pur¬ 
pose hardware that we are currently builcing or interfacing to our robot 
system. This hardware includes two types of scanning range finders, a 
doppler radar (for detecting motion) and a Unimate arm. We also briefly 
describe the current configuration of our complete computer system. 

B. A Time-of-Flight Range Scanner 
1. Introduction 

For the purpose of scene analysis in a three-dimensional world, 
it is desired to generate a picture" analogous to a television picture, 
but where each picture point is associated with an analogue value denoting 
range rather than light intensity. The simplest method for determining 
the range of each picture point might seem to be by measurements on 
stereo pairs of images taken by the same television camera. In practice 
this method has proved to be seriously limited on at least four counts: 

• Inadequate resolution: at the 120 by 120 element resolution 
we have been using, the displacement (measured in elements) 
between corresponding points in the images is not enough. 

• The range accuracy falls off rapidly with distance. 

• The calculations for so many points consume too much com¬ 
puter time. 

• There are difficulties due to occlusion, because a picture 
element visible from one lens viewpoint may not be visible 
from the other. 
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A scanning range finder currently under construction eliminates 
all of the above problems in that it possesses high range resolution that 
does not vary with distance, and gives direct readout without computation. 
In addition, since it is a "one-eyed" device, the occlusion problem does 
not arise. 

2. The Experimental Model of the Range Scanner 

A simplified block diagram of the range scanner is given in 
Figure 22 and a more detailed diagram in Figure 23. The range scanner 
consists of three functional components:: 

• A transmitter that amplitude modulates a light beam with a 
9-MHz sine wave 

• A mirror scanner that sweeps the modulated beam over the 
field of interest 

• A receiver that picks up the light from whatever object 
intercepts the scanning beam. After amplification the 
output from the receiver is demodulated by a phase demodu¬ 
lator. 

The demodulator effectively measures the length of time required for a 
light beam to make the round trip from range scanner to light scattering 
object and back. The range value, of course, is a constant multiple of 
this time. 

The merits of the phase demodulator are worth emphasizing. In 
this device the output from the receiver is compared with the phase of the 
modulating signal feeding the transmitter, and a dc level is derived that 
is a function only of this phase difference. Since the phase difference 
varies linearly with the distance to the light scattering object, the dc 
level is a measure of the range of the object. The method has the virtue 
of being a one-frequency method not depending on the amplitude of the 
received signal- it is therefore possible to trade range discrimination 
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FIGURE 22 SCANNING RANGE FINDER—SIMPLIFIED 


against signal integration time. The sensitivity calculation described 
later indicates that it should be possible to make 1,000 measurements per 
second with a range discrimination of 0.1 inch using a transmitter power 
of only 1 milliwatt. 

Considering the transmitter section, the process begins with 
the 9 MHz crystal oscillator that drives the modulator via an adjustable 
phase shifter. The phase shifter provides an adjustment of approximately 
± 60 so that one may adjust for a precise zero or, alternatively, off-set 
the minimum range and use increased sensitivity in the read-out. The zero 
may also be displaced by fixed amounts and with great precision by short 
lengths of coaxial cable inserted between the various units. The 
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FIGURE 23 SCANNING RANGE FINDER—DETAILED 
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modulator then provides amplitude modulation of the light source, which 
may be either a 9050 A.U. gallium arsenide diode or a 6328 A.U. helium- 
neon laser. The latter will most probably be used since it can provide a 
smaller spot size (better lateral resolution). Although gas lasers 
typically show 1 percent amplitude noise modulation, this does not inter¬ 
fere, except to a minor extent, with the precision of the phase measure¬ 
ment. The modulated light beam is then deflected by a scanning mirror 
system (in two orthogonal directions) so that it scans the field of view 
that is of interest. 

The receiver section makes use of the same scanning unit so 
that its line of sight coincides with that of the laser beam. The return 
signal is put through an interference filter to eliminate the ambient il¬ 
lumination and passes to a high-grade photomultiplier, an RCA CA 31034, 
where it is converted to electrical form. The use of a premium quality 
photomultipler is warranted since it is here that the signal-to-noise 
ratio of the system is determined. The primary requirements are for an 
efficient photocathode and low dark current. It is anticipated that the 
signal level from the CA 31034 will be at least 1 millivolt. This signal 
is then put through a 2.5 kHz wide crystal filter (form factor 2:1) at 
the first opportunity in order to minimize effects due to electrical 
interference, in particular before any saturation can occur. The filter 
is followed by a symmetrical limiter to remove amplitude variations due 
to inverse square law, orientation of the surface, reflection coefficient, 
laser noise, and so on. Note that the limiter must limit the top and 
bottom halves of the sine wave symmetrically since asymmetry usually in¬ 
troduces a shift in phase. 

Perhaps the most stringent part of the design occurs in the phase 
demodulator. What is required is a clean multiplier. We wish to multiply 
together the reference signal, e sin cut, and the reflected signal that 
has passed down the receiver chain, e sin(u)t + cp). We do not have a 

mt 
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great deal of control over the various phase shifts that occur down the 
receiver chain, but we assume they can be nulled out by the phase shifter 
mentioned previously. We are interested in developing a dc level that 
varies linearly with the variable part of ^p; i.e., the part due to the 
variable path length between the deflecting mirrors and the reflecting 
object. The output, M, from the multipier is 


k • e e • sin (it • sin(uut + cp) 
1 2 


= ke i e 2 • 81 


• sin (itjsin (it • cos cp + cos (it • sin cp| 


= ke i • e 2 • jcos cp - 


1 cos 2 (it sin 2 out | 

2 " 2 + 2 j 


where k is an arbitrary constant. The terms in 2 (it are easily removed 
by a low-pass filter leaving an output depending only on cos cp; we shall 
be working over the linear region either side of 90°, as shown in Figure 24. 
Note that the output also varies with e^ and e^, and therefore these must 
be kept constant to at least the precision of range measurement. In the 
hardware that has been constructed, the output fluctuation is of the 
order of ±3 millivolts for a working output range of ±3 volts. 


3. Calculation of Sensitivity 

The initial design assumptions are as follows: 

-3 

Scanning transmitter power (collimated beam) = 10 watt 

Maximum range = 10 ft 

Minimum reflectivity = 1 percent 

The reflected power is assumed to be scattered uniformly over 
a hemisphere. 

Receiver capture area = 1 sq in. 
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FIGURE 24 ANALOG OUTPUT FUNCTION 


The received power, P, applied to the photomultipler is 


P = io -3 • 10 “ 


-10 

= 10 watt 


2tt • (120) 


The sensitivity of the RCA CA 31034 photomultipler is given as 4 x 10 4 

4 

amps per watt at 6328 A.U., and 1 X 10 amps per watt at 9050 A.U. For 

an input of 10 watt at 9050 A.U., the output current = 10 ^ x 10 4 amps 
-6 

which equals 10 amps. Assuming a load impedance of 2,000 ohms 


3 —6 

Output level = 2 x 10 x 10 =2 millivolts 


T 





For a bandwidth of 2.5 kHz in the receiver chain, the self-generated noise 
level should be less than 2 microvolts; this is compatible with 10-bit ac¬ 
curacy for the range output and 1000 measurements per second. 

C. A Triangulation Range Finder 

A range finder based on triangulation was designed and built by the 
late John H. Munson. A schematic drawing of this range finder is shown 
in Figure 25. A collimated light beam scans the field of view by means 
of a rotating "transmitting" mirror. A portion of this beam, scattered 
or reflected from a surface "point" of an object in the scene, is ac¬ 
cepted by a rocking "receiving" mirror. The angular velocity of the 
transmitting mirror is much higher than that of the receiving mirror. The 
functions that relate time, t, to the angles cp and cp of the transmitting 
and receiving mirrors, respectively, are known. The cyclic time of the 
receiving mirror is divided into N time intervals, where N is the number 
of points scanned horizontally. For each of these time intervals, V u 
is known and may be regarded as fixed relative to the time varying 
By measuring the time of acceptance of the scattered light by a receiving 
photo detector, cp (t) is computed. Knowing cp , cp , and the distance 

between the transmitting and receiving mirrors, the range (defined as 
the distance between the receiving mirror and an object point) is com¬ 
puted for each of the horizontally scanned N points. This procedure is 
repeated as the tilt angle of the whole system is varied incrementally 
by means of a tilting drive. A raster of range values for the entire 
field of view is thus obtained. 

Although range data have been collected by this range finder, 
we have decided to concentrate our efforts on the range finder described 
in the preceding section, because the range finder of Figure 25 has the 
following disadvantages: 
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• The accuracy and sensitivity are inadequate, especially as 
the range increases. 

• Range data are missing for those object points that arc seen 
by the receiving mirror but not by the transmitting mirror. 

• The maximum range is limited to 8 feet by the intensity of 
the transmitting light source. 

• The offset between the receiving mirror and a camera used 
in scene analysis poses a registration problem. 

D. A Radar Motion Detector 

A snail, inexpensive, doppler radar, motion deteetor was physically 
installed on the robot vehicle during the summer. Its primary commercial 
application is as an intrusion alarm. This sensor, when made operational, 
can detect even a slight motion that oeeurs within a teardrop shaped re¬ 
gion 40 feet wide and 50 feet long directly in front oi the robot. The 
device is essentially a microwave transmitter radiating from a parabolie- 
section horn antenna and a receiver operating from a second similnr an¬ 
tenna. The FCC allocated frequency is 10,525 mHz ±25 ii'''z. 

This novel capability will provide the robot with a way of dealing 
with dynamic changes in its environment and, when interfaced, may be the 
basis for a new series of experiments. For example, an interrupt from 
the motion detector could be used to turn on the TV camera in order to 
determine what changes have taken plaee. 

E. Unimate Arm 

A Unimate industrial arm is being interfaced with the PDP-15/PDP-10 
time-sharing system to provide a facility for multiple effector experi¬ 
ments. As originally manufactured, the Unimate arm and hand assembly is 
hydraulically positioned under the control of a program stored on a mag¬ 
netic drum. The absolute position for eaeh of five degrees of freedom 
is determined by optical shaft encoders. The Unimate is also conveniently 
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equipped with input lines to sense external ev^nt.-, and output lines to 
provide indication of completed internal events 

In converting the Unimato to computer control, the memory drum is re¬ 
placed by a buffer register that is connected to the computer through an 
interface. A block diagram of the interconnections for a PDP-15/linimate 
system is shown in Figure 26. A block diagram of the interface control 
logic is shown in Figure 27. 
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FIGURE 26 PDF-15AJNIMATE COMMUNICATIONS: BLOCK DIAGRAM 


- p • Current System Con f iguration 

The current configuration of our computer system is shown in Figure 28. 
The primary changes made during the past year have been the addition of a 
fifth disk pack drive, two magnetic tap< drives (and interface), a scan 
converter (thnt will allow us to display gray-scale digital pictures), 
and a new television camera suitable for obtaining color information. 







To From 



FIGURE 27 PDP-15/UNIMATE COMMUNICATIONS: UNIMATE CONTROL LOGIC 
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Publications 


B. Raphael, "The Role of Formal Theorem Proving in Artificial In¬ 
telligence," Artificial Intelligence Center Technical Note 63, 
Stanford Research Institute, Menlo Park, California (February 197?). 
This was the first of two lectures presented at the Japan Industrial 
Technology Association International Symposium cn Pattern Information 
Processing Systems, Tokyo, March 6-17, 1972. 


. "Robot Research at Stanford Research Institute," Arti¬ 
ficial Intelligence Center Technical Note 64, Stanford Research 
Institute, Menlo Park, California (February 1972). 

This was the second of two lectures at the Japan Industrial Tech¬ 
nology Association Internationa] Symposium on Pattern Information 
Processing Systems, Tokyo, March 6-17, 1972. 

R. Fikes, P. Hart, and N. Nilsson, "Some New Directions in Robot 
Problem Solving," Artificial Intelligence Center Technical Note 68, 
Stanford Research Institute, Menlo Park, California (May 1972). 

Also to be published in Machine Intelligence 7. 


___ , "Learning and Executing Generalized 

Robot Plans," Artificial Intelligence Center Technical Note 70, Stan¬ 
ford Research Institute, Menlo Park, California (July 1972). 

Also will be in Artificial Intelligence , Vol. 3, No. 4 (to appear 
Winter 1972). 

D. Nitzan, "Stereopsis Error Analysis," Artificial Intelligence 
Center Technical Note 71, Stanford Research Institute, Menlo Park, 
Ca’ifornia (September 1972). 


_"Scene Analysis Using Range Data," Artificial Intelli¬ 
gence Center Technical Note 69, Stanford Research Institute, Menlo 
Park, California (August 1972). 
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P. E. Hart and N. J. Nilsson, "Shakey: Experiments in Robot 
Planning and Learning," 25-minute color and sound film that reviews 
recent experiments, Stanford Research Institute, Menlo Park, Cali¬ 
fornia (August 1972). 


B. Presentations 


Technical meeting on Computer Vision held at Pajaro Dunes, California, 
November 10-13, 1971. Attendees from SRI: T. D. Garvey, C. A. Rosen, 
and C. L. Fennema. 

N. J. Nilsson, seminar talk entitled "Composing, Using, and Exe¬ 
cuting Robot Plans" at the University of Illinois, Champaign, Illi¬ 
nois, December 16, 1971 

Fiibush Point Seminar held at the University of Edinburgh, Scotland, 
February 28 to March 3, 1972. Attendees from SRI: R. E. Fikes, 

P. E. Hart, N. J. Nilsson, and J. F. Rulifson. 

N. J. Nilsson, lecture tour in 1972: University of Calgary, Alberta, 
Canada, March 22; University of Alberta, Edmonton, March 23; Uni¬ 
versity of British Columbia, Vancouver, March 24. The talk was en¬ 
titled "A Problem-Solving System for a Robot." 

Current Research in Problem Solving and Machine Perception Seminar 
held at the University of California, Department of Psychology, San 
Diego, California, April 17, 1972. Attendees from SRI: R. E. Fikes, 

B. Raphael, and J. M. Tenenbaum. 

R. E. Fikes, seminar talk entitled "New Experiments with Shakey the 
Robot" at the University of California, Irvine, California, June 2, 
1972. 

P. E. Hart, seminar talk entitled "A Short Survey of Artificial In¬ 
telligence and Robots," at Jet Propulsion Laboratory, Pasadena, 
California, June 30, 1972. 

C. A. Rosen, seminar talk entitled Robots, Productivity, and 
Quality," Jet Propulsion Laboratory, Pasadena, California, June 30, 
1972. 

J. M. Tenenbaum, talk on robotics to ACM Peninsula Chapter, July 13, 
1972. 
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Remotely Manned Systems Conierence, Pasadena, California, September 
13-15, 1972; sponsored by the National Aeronautics and Space Admini¬ 
stration and the California Institute of Technology. Attendees from 
SRI: P. E. Hart, N. J. Nilsson, and C. A. Rosen. 
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